# FPGA Design and Implementation of FFT processor for OFDMA system

# Salauddin Mohammad<sup>1</sup>, Piyush Kuchhal<sup>2</sup> and Rajeev Kumar Gupta<sup>3</sup>

(Asst. Professor<sup>1</sup>, Sr. Asso. Professor<sup>2</sup>, Asso. Professor<sup>3</sup>)
(University of Petroleum and Energy Studies, India)
(msalauddin@ddn. upes. ac. in<sup>1</sup>, pkuchhal@ddn. upes. ac. in<sup>2</sup>, rajeev@ddn. upes. ac. in<sup>3</sup>)

#### **Abstract**

For Orthogonal Frequency Division Multiple Access (OFDMA) system module, there is a need for an efficient alterable point FFT processor. Therefore, it is meaningful to design a FFT processor for which input data points could be alterable. In this paper the variable input FFT processor is designed to meet the requirements of OFDMA system. For this, in this paper we select the 2D Fourier transform algorithm as the kernel algorithm, VHDL language is used to present a detail design of two-stage pipeline structure, ModelSim (SE) for the simulation, and verify on the Spartan3E FPGA. Simulation results show that the way of implementation and design is right and meet the IEEES02.16e standard, at the same time the data precision is 16 bits, limits the clock frequency of 100MHz, the overall timing design stability, and it could reach the scope of real-time processing.

**Keywords**-Orthogonal Frequency DivisionMultiple Access (OFDMA); Fast Fourier Transform (FFT); Field Programmable Gate Array (FPGA).

# I. INTRODUCTION

In modern communication systems Orthogonal Frequency Division Multiple (OFDM) plays a crucial role, and it is replaced by Orthogonal Frequency Division Multiple Access (OFDMA) which is a multi-user OFDM that allows multiple accesses that scheme that combines TDMA and FDMA on the same channel, widely for the next generation wireless communication systems such as WiMAX and 3G-LTE standard in order to accommodate many users in the same channel at the same time. The OFDMA physical layer is based OFDMA modulation, which comprises of OFDM modulation as well as subcarrier allocation. Therefore, it is significant to focus more attention on wireless communication technology. OFDMA typically uses a FFT size much higher than OFDM, and divides the available sub-carriers into logical groups called sub-channels. Unlike OFDM that transmits the same amount of energy in each subcarrier, OFDMA may transmit different amounts of energy in each sub-channel i.e., users may also occupy more than one sub channel, upon their Quality of Service (QoS) profiles and system loading characteristics.

FFT architectures can be classified into three categories, the parallel architecture, the pipelined architecture and the memory based architecture. The cascade structure of FFT processor presented in reference [5] could implement variable input data length based on radix-16 and radix-2/4/8 combined radix algorithm. The design need to be improved as the architectures need more data storage resources and at the same

time there is a need to increase the precision when floating point is used instead of fixed point. Reference [6] presented a scalable pipelined FFT architecture which needs to improve the precision when floating point is used instead of fixed point. Therefore, after analyzing many FFT architectures this paper presents a variable length FFT processor of a pipelined structure by using radix-2 Decimation-In-Time and 2D-FFT algorithms.

The rest of the paper is organized as follows: Section-II details the FFT and 2D-FFT algorithms. Section-III illustrates the overall design of the processor. The implementation and simulation will be detailed in section-IV. Finally, section-V gives the main conclusions of the work.

#### II. FAST FOURIER TRANSFORM

#### A. Discrete Fourier Transform:

The DFT is one of the most important tools of digital signal processing. The number of complex multiplications and addition operations required by the simple forms both the DFT and IDFT is of order  $N^2$ . This is because if there are N data points to calculate, each of which requires N complexarithmetic operations. The DFT has its applications in data compression, in communication, RADAR, GPS navigation, image processing, antenna arrays and sonar systems.

The discrete Fourier Transform (DFT) of the N-point input X(k) is defined as follows

$$X(k) = \sum_{n=0}^{N-1} x(n) * W_N^{nk}$$
, k= 0,1,2,...N-1 (1),

whereN is the transform length,  $W_N^{nk} = e^{-j2\pi/N}$  is twiddle factor.

## **B. Fast Fourier Transform:**

The Fast Fourier Transform is an efficient way to compute DFT and its inverse and produces same results as that of DFT. The only difference is that it is faster because FFT uses the symmetric, periodic and recursive properties of twiddle factor. FFT is an optimized fast algorithm and increases the computational efficiency of DFT. Both the DIT-FFT and DIF-FFT algorithms rely on the recursive decomposition of an N point transform into two (N/2) point transforms. This decomposition process can be applied to any composite (non prime) N. The FFT computes the same DFT in only  $N/2 \log_2 N$  operations. The algorithm proposed by Cooley-Tukey is a divide and conquer algorithm which recursively divides the DFT of size  $N=N_1N_2$  into many small DFTs of sizes  $N_1$  and  $N_2$  along with N multiplications with twiddle factors. The algorithm is divided into time based (DIT) and frequency

based (DIF) Fast Fourier algorithms. DIT-FFT orders the data from bit reversal order to normal order. Whereas DIF-FFT is converse. The basic idea of these algorithms is to divide the N-point FFT into smaller ones until two point FFT is obtained. Hence the algorithm is called radix-2 algorithm.

# C. Two Dimensional (2D) FFT:

Long FFTs are quite often used for frequency analysis and communications applications. The long FFTs require more memory bandwidth for matrix transpositions. The architecture is designed using two FFTs of length  $N_1$  and  $N_2$  to calculate the FFT of size  $N=N_1*N_2$ . The 2D FFT of  $N=N_1*N_2$  is defined as follows:

$$X[k_1N_1+k_1] = \sum_{n_1=0}^{N_1-1} [e^{-j\frac{2\Pi n_1k_2}{N}} (\sum_{n_2=0}^{N_2-1} x[n_1N_1+n_1]e^{-j\frac{2\Pi n_2k_2}{N_2}})]e^{-j\frac{2\Pi n_2k_2}{N_1}}$$
 where,  $0 \le k_1 \le N_1-1$ ;  $0 \le k_2 \le N_2-1$ 

#### III. ARCHITECTURE DESIGN

If the cascade structure or pipelined structure is used for designing the variable point FFT then the ping-pong memory structure used needs a lot of memory at each level. Hence, 2D FFT is used for designing FFT processor not only to improve the system level of parallelism and also to reduce the memory required.

The core module of OFDMA PHY layer is FFT module in which we can use1024-point, 512-point and 128-point FFT modules. The overall block diagram of the FFT processor is shown in fig 1.

The basic idea of the 2D Fourier transform i.e., N=128 is constructed from  $N_1=2$  and  $N_2=64$ , the data is first arranged in 64 lines and 2 rows, next the input data will transform the 64-point FFT and then obtained result is multiplied with twiddle factor and finally the result do 2-point FFT. Similarly 512-point FFT can be implemented firstly by constructing 64-point FFT and then transform further to 8-point FFT.



Fig.1. Block Diagram of Overall FFT processor

#### **B. DDS core:**

DDS provides remarkable frequency resolution and allows direct implementation of frequency, phase and amplitude modulation. DDS (Direct Digital Synthesizer) Compiler core sources sinusoidal waveforms for use in many applications. A

DDS consists of a Phase Generator and a SIN/COS Lookup Table. Direct digital synthesis (DDS) is a method of producing an analog waveform usually a sine wave by generating a time-varying signal in digital form and then performing a digital-to-analog conversion. Because operations within a DDS device are primarily digital, it can offer fast switching between output frequencies, fine frequency resolution, and operation over a broad spectrum of frequencies. The basic block diagram of DDS implemented is shown in the Fig2.



Fig.2. Block Diagram of DDS Core



Fig 3. Synthesis and Simulation Results

DDS devices are now available that can generate frequencies from less than 1 Hz up to 400 MHz (based on a 1-GHz clock). The digital architecture of DDS eliminates the need for the manual tuning and tweaking related to component aging and temperature drift in analog synthesizer solutions.

All the blocks are connected with common clock and reset signals. The delta phase value decides the phase increment for each clock pulse. Hence it decides the resulting signal frequency. The phase accumulator consists of phase increment register; adder and phase register which produces accumulated

phase value for each clock pulse. The phase increment register stores the instantaneous phase increment values resulting from frequency modulation control block. This is fed to a 8 bit adder as one of its input. The other input for adder is phase register output. The phase register holds the instantaneous phase for each clock pulse. Four Look up Tables are implemented to produce four different output waveforms, namely sine wave, square wave, triangular wave and arbitrary waveform. The phase bits given by the accumulator is used to address a look-up table held in ROM (read-only memory) which gives corresponding amplitude bits. The COS carrier LUT consists of phase Vs amplitude table corresponding to one COS waveform.

#### C. 64-Point FFT Module:

The 64-point FFT module is designed by using two 8-point FFTs. The 8-point FFT module is the kernel part of the architecture which consists of butterfly structure, twiddle factor, coefficient RAM, controller and an address generation unit. The results of the 8-point FFT module are compared with MATLAB results.

Table1: Results of 8-point FFT processor

| Results of 8-Point FFT module |                 | Results of Matlab |
|-------------------------------|-----------------|-------------------|
| Real Parts                    | Imaginary Parts |                   |
| 1                             | 0               | 1.0000            |
| 5.01045                       | 3.38915         | 5.0104+3.3891i    |
| 2.50005                       | 4.50005         | 2.5000+4.5000i    |
| -7.01045                      | 4.38915         | -7.0104+4.3891i   |
| 6                             | 0               | 6.00000           |
| -7.01045                      | -4.38915        | -7.0104-4.3891i   |
| 2.5005                        | -4.5005         | 2.5000-4.5000i    |
| 5.01045                       | -3.38915        | 5.0104-3.3891i    |

#### **D. Select and Control Module:**

This module is crucial to compute the alterable data length in the design. A two bits signal mode is chosen as for the requirement. When mode=00, means to choose 8-point FFT module to complete the 64-point FFT, when mode= 01 means to choose 2-point FFT to compute 128-point FFT. Here the result of 64-point FFT is chosen to do 2-point FFT. Thus we obtain 128-point FFT. Similarly when the mode =10 means to complete 512-point FFT, when mode=11 means to complete 1024-point FFT.

#### **IV.SIMULATION**

The input data length of our proposed FFT processor is a parameter which can be decided by itself at the range of 128, 512, 1024. Take 512-point FFT as an example. At first, the 512 points FFT is coded by MATLAB language. After the chosen FFT algorithm is valid, the architecture of the processor is modeled in VHDL and verified by Xilinx ISE and timing simulation using ModelSim software. And also Direct Digital Synthesizer was used to generate the sine wave form which covers all the real and imaginary parts of the input lengths 128 point, 512 point and 1024 point FFT. To verify the timing simulation, a test bench file included the TEXTIO

package was written to read input data and write FFT results. The peaks indicate the FFT of 128-point, 512-point, 1024-point and 20148-point of real and imaginary parts.

#### V. CONCLUSION

In this paper, a variable point FFT processor was designed using FPGA and was applicable to OFDMA system successfully. The results showed the successful completion of the design altered input points FFT computation, design precision 16-bit, FFT processing result was correct. Therefore this design can be applied to real-time signal processing system, which completes the main computing modules in the OFDMA system.

### REFERENCES

- [1] N. Mokhtariana\* & G.A. Hodtaniab "Analog implementation of radix-2, 16-FFT processor for OFDM receivers: non-linearity behaviors and system performance analysis" International Journal of Electronics, Volume 102, Issue 12, 2015, pages 2046-2061.
- [2] Nirav Chauhan, ShrutiOza, Kiran Parmar "Survey of Optimization of FFT processor for OFDM Receivers" International Journal for scientific Research & Development" Volume: 1, Issue: 2, 2013, Page(s): 74-77.
- [3] M.Viswanadh,M.Harshavardhan Reddy, NaveenaBoppana "An Efficient Pipelined FFT Processor for OFDM Communication Systems" International Journal of Engineering Trends and Technology (IJETT), Volume-5 Number-6, 2013.
- [4] WiMAX Forum, Krishna Ramadas and Raj Jain, "WiMAX System Evaluation Methodology" version 2.1 July 2008.
- [5] IEEE Std 802.16e<sup>TM</sup>-200S and IEEE Std 802.16<sup>TM</sup>-2004/Corl-200S
- [6] SumitWadekar, Laxman P, Thakare, Dr. A.Y. Deshmukh "Reconfigurable N-Point FFT Processor Design For OFDM System "International Journal of Engineering Research and General Science Volume 3, Issue 2, March-April, 2015
- [7] Byungcheol Kang, Jaeseok Kim, "Low complexity multi-point 4-channel FFT processor for IEEE 802.11n MIMO-OFDM WLAN system," IEEE 2011.
- [8] Rajesh S. Bansode, PrajaktaBorole, "Hardware implementation of an OFDM transceiver for 802.11n systems," IJSER Vol. 4, Issue 6, June 2013.