# Design of High throughput 512-Point FFT Processor Using Wallace-tree Multiplier ## Nandhini. $A^1$ and Magthelin therase. $L^2$ <sup>1, 2</sup> Assistant Professor, ECE Department, Sathyabama university, Chennai-119, India. abnandhu@gmail.com<sup>1</sup>, lmagthelin@yahoo.co.in<sup>2</sup> #### **Abstract** FFT (Fast Fourier transform) algorithm has agreeably the most important signal processing algorithm in wireless application. The MDF (Multipath delay feedback) is one of the several types of FFT architecture, which has been repeatedly used to provide a high throughput rate in order to minimize the area and power consumption. The proposed architecture has been designed to support FFT length of 512-point using radix-2<sup>5</sup> algorithm. In this method FFT has been designed using Wallace tree multiplier instead of booth multiplier used in the existing method. The high speed and area efficient design have been achieved by using Wallace tree multiplier. Thus the proposed multiplier requires only 9% of the total area, which indicates a lower hardware complexity as compared to the existing architecture. The simulation and synthesis steps have been performed by the Modelsim and Quartus II software using verilog HDL. The synthesis report demonstrate the highest throughput rate of 3.6 Gb/s which has been achieved by Wallace tree multiplier. Keywords: Butterfly unit, Booth encoder, DSP, FFT, Wallace tree multiplier. #### I. INTRODUCTION FFT plays a major role in the field of Digital Signal Processing (DSP) such as filtering, spectral analysis, etc., to compute the discrete Fourier transform (DFT). It is frequently used in modern digital communications especially in digital video broadcasting and orthogonal frequency division multiplexing (OFDM) systems [1],[9].FFT algorithm is well known as a block processing algorithm and can be performed on a sequence of data. This algorithm is characterized by a set of addition, subtraction and multiplication operations. The elementary computational unit of FFT is the butterfly, which accomplish both complex multiplications by the twiddle factors with pieces of data in the sequence and the addition/subtraction of these products [2]. The algorithm consists of various stages of computation. Data storage is mandatory between these stages, and can be carried out in several ways. The greater amount of data storage is necessary to perform a larger FFT. The FFT/IFFT processors must be pipelined to provide high throughput. Many FFT architectures being designed to utilize the OFDM transmission efficiently such as Single path Delay Feedback(SDF) and Multi-path Delay Feedback (MDF). In that MDF is often preferred in order to provide a high throughput .Extremely to reduce the area and power consumption several FFT algorithms had been proposed in the existing methods [3], [10]. The radix of the algorithm highly authorised the architecture of the FFT/IFFT processor. Small radix consequences a simple butterfly and higher radix will make less the twiddle factor multiplication. hence $r^k$ algorithm is most adopted so that simple butterflies and reduced twiddle factor multiplication is achieved. The Radix-2 algorithm is a well known simple algorithm for FFT/IFFT processor. The Radix-4algorithm is suitable to achieve a high Data rate [4].The 512-point FFT/IFFT processor has been designed with the modified Radix-2<sup>5</sup> algorithm in the proposed method. It can attain a high throughput rate compared to the lower radix algorithms. This paper has been also compared with the FFT processor with radix 2<sup>5</sup> algorithm which uses other multipliers. The speed of the processor is improved in the design of Proposed FFT/IFFT processor with the benefit of high speed Wallace tree multiplier in place of complex booth multiplier/other multipliers. #### II. PROPOSED FFT PROCESSOR In this paper an area efficient FFT processor has been designed. The proposed architecture composed of input unit, output unit, control unit, ROM1&2 and a multiplier module. This module has butterfly units with Wallace tree multipliers. The butterfly units perform complex additions and subtractions of given input data. The main impartial of proposed processor has to minimize the area occupied and to increase the speed of operation. Figure.1 shows the proposed block diagram of FFT processor with Wallace multipliers. The major units of this processor have been elaborately discussed in the following section. Fig.1 Block diagram of proposed FFT/IFFT processor ### A. Butterfly units: In FFT algorithm the butterfly unit has been used to estimate the combined results of smaller DFTs into larger DFTs and vice versa. Fig.2 (a) & (b) shows the architecture of butterfly units. It has been composed of multipliers, adders and multiplexers. The butterfly units perform complex additions and subtractions of two input datas X[n] and X[n+N/2]. The attitude of the butterfly units has to secure all input values into the FIFO until the $N/2^{th}$ input has been entered Fig 2(a): Butterfly units Fig 2(b): Butterfly units Later, the butterfly units carried out the calculations among the input values and FIFO outputs, after infiltrating $(N/2) + 1^{st}$ the input. All the while of last N/2 clock cycles, all butterfly calculations are carried out at each stage [5],[6]. Between the butterfly outputs, the complex addition outputs are fed to the next stage. And, the complex subtraction outputs are secured in the FIFO, and then during the next N/2 clock cycles, the FIFO outputs are augmented to the next stage. Butterfly unit 1 (BU1) carried out complex additions and subtractions only. However, butterfly unit 2 (BU2) having twiddle factor multiplication using the multiplexers and control signals. A discrete Fourier transform (DFT) of length *N* is defined as follows: $$X(k) = \sum_{n=0}^{N-1} x(n)Wn^{k}, k=0,1,2.....N-1$$ (1) where Wn is twiddle factor which denotes the N-th primitive root of unity with its exponent evaluated modulo N. k is the frequency index and n is the time index [6], [13]. The 512-point FFT computation with radix-2k algorithm consists of 9 stages. The radix-2k algorithm has same butterfly structure regardless of k. However, the twiddle factor multiplication structure is different by factor k. Applying a 6-dimensional linear index map [6]. $$n = \left\{ \left( \frac{N}{2} n + \frac{N}{4} n + \frac{N}{8} n + \frac{N}{16} n + \frac{N}{32} \frac{N}{16}$$ ## **B.** Wallace tree multiplier: Fig.3 shows the architecture of Wallace tree multiplier. It has been mainly composed of partial product generator, Wallace tree adders, carry-lookahead adder and booth encoder. It influences the efficient hardware implementation of a digital circuit that multiplies two integers. Considering Non-regularities in the construction of traditional multipliers consequence in a large amount of unwanted area [7],[8]. The multiplier takes in operands: the multiplier (MR) and the multiplicand (MD), produces the 8 multiplication result of the two at its output. The architecture of this multiplier mainly composed of three major modules. Booth encoders, Wallace Tree adders and the Carry select Adders. Fig.3 Wallace tree multiplier #### III. SIMULATION AND SYNTHESIS REPORT OF PROPOSED SYSTEM The simulation result and synthesis report of proposed 512-PT FFT architecture with Wallace tree multiplier has been enumerated in this section. Figure 4 &6 shows the simulation results and figure 5 & 7 shows the synthesis report of the of Wallace tree multiplier and 512-pt FFT processor. Table:1 has the comparison of existing and the proposed system based on the simulation and synthesis report. ## A. Simulation result of the Wallace tree multiplier: Fig. 4 Simulation of Wallace tree multiplier ## B. Synthesis report of Wallace tree multiplier: Fig.5 Synthesis of Wallace tree multiplier ## C. Simulation result of 512-Point FFT/IFFT Processor Design: Fig.6 Simulation of 512-point FFT/IFFT Processor design ## D. Synthesis result of 512-pointt FFT/IFFT Processor: Fig.7 Synthesis of 512-point FFT/IFFT Processor Design #### E. COMPARISON TABLE: | Table: 1 Comparison of Existing and Proposed system | Table:1 | Comparison | of Existing and | <b>Proposed system</b> | |-----------------------------------------------------|---------|------------|-----------------|------------------------| |-----------------------------------------------------|---------|------------|-----------------|------------------------| | PARAMETERS | EXISTING | PROPOSED | |--------------------------|----------------------|----------------------| | | SYSTEM | SYSTEM | | FFT SIZE | 64-PT | 512-PT | | ALGORITHM USED | Radix-2 <sup>3</sup> | Radix-2 <sup>5</sup> | | TOTAL LOGIC ELEMENTS (%) | 14 | 9 | | SPEED OF THE PROCESSOR | 2.5 | 3.6 | | (Gb/s) | | | | MULTIPLIER USED | Booth multiplier | Wallace tree | | IN DESIGN | _ | multiplier | In this project, 512-point Radix-2<sup>5</sup> algorithm for FFT is presented. The number of complex multipliers and twiddle factor LUTs are reduced using the Radix-2<sup>5</sup> algorithm. The proposed FFT design has the most area-efficient and high performance architecture for the eight parallel 512-point MDF FFT processors. The highest throughput rate of 3.6 Gb/s has been achieved with the use of Wallace tree multiplier in this design. The proposed multiplier requires only 9% of total area, which indicates a lower hardware complexity as compared to the existing architecture. The proposed architecture has been designed in verilog HDL and also simulation has been done to verify its functionality. Both the simulation and synthesis steps have been performed using the Modelsim and Quartus II software. The proposed FFT/IFFT processor design has potential applications in high-rate OFDM-based WPAN systems. ## **REFERENCES** - [1] Cooley, J.W., Tukey, J.: An Algorithm for Machine Calculation of Complex Fourier Series. In: Mathematical Computing, vol. 19, pp. 297--301, (1965). - [2] ADSP-21000 Family Application Handbook Volume 1, Analog devices. - [3] M.Ayinala.., .M.Brown., and Parhi.K.K,(Jun.2012) "A Pipelined parallel FFT architectures via folding transformation," IEEE Trans. Very Large Scale Intergr. (VLSI) Syst., vol. 20, no. 6, pp 1068–1081. - [4] 3GPP, 'Spreading and modulation (FDD)', 3G TR 25.213 v3.3.0, Oct. 1999. - [5] B Pushparaj ,C Paramasivam "High Performance and Low Power Modified Radix-2<sup>5</sup> FFT Architecture For High Rate WPAN Application", International Journal of Computational Intelligence and Informatics, Vol. 2: No. 3, October December 2012. - [6] Taesang Cho and Hannho lee, "A High-Speed Low Complexity Modified Radix-2<sup>5</sup> FFT Processor for High Rate WPAN Application" IEEE Trans. VLSI SYSTEMS, vol.21, no.1, JANUARY 2013. - [7] Ravi Payal "Simulation Model Of Wallace Tree Multiplier Using Verilog", International Journal of Engineering Research & Technology Vol. 2 Issue 10, October 2013. - [8] Mark Ronald Santoro "DESIGN AND CLOCKING OF VLSI MULTIPLIERS" Technical Report No. CSL-TR-89-397, October 1989. - [9] Esai Amudha Vani, Joseph Gladwin "Low Power FFT Architectures via Folding Transformation", Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication, Elsevier, 2013. - [10] S.Leena jebamoney, A.Nandhini, S.Krishnapriya, : "Design of an area efficient FFT/IFFT processor for WPAN applications",IJERD, Volume 10, Issue 3 (March 2014), PP.06-10. - [11] Tzi-Dar Chiueh ,Pei-Yun Tsai "OFDM Baseband Receiver Design for Wireless communications", 2007 John Wiley & Sons (Asia) Pte Ltd. ISBN: 978-0-470-82234-0. - [12] Y. Lin, H. Liu, and C. Lee, "A 1-GS/s FFT/IFFT processor for UWB applications," IEEE J. Solid-State Circuits, vol. 40, no. 8, pp. 1726–1735, Aug. 2005. - [13] J. Lee and H. Lee, "A high-speed two-parallel radix-24 FFT/IFFT processor for MB-OFDM UWB systems," IEICE Trans. Fundamentals,vol.E91-A, no.4, pp.1206-1211, April 2008.