# Design and implementation of a Floating-Point Fused Add-Subtract Unit using CSLA # Dr. Fazal Noorbasha<sup>1</sup> and Shaik Manjusha<sup>2</sup> 1 Associate Professor, Department of ECE, K L University, Vaddeswraram, Guntur, A.P, India. 2 M.Tech. Student, Department of ECE, K L University, Vaddeswraram, Guntur, A.P, India. 1 fazalnoorbasha@kluniversity.in; 2 shaik.manjusha9@gmail.com #### Abstract: In digital integrated circuit design most widely used digital circuits are Adders. The fast performing adder is Carry select adder (CSLA) from among the conventional adder structures. In this paper a floating-point fused add-subtract unit is described that performs simultaneous floating-point add and subtract operations on a single-precision data in about the same time that it takes to perform a single addition with a conventional floating-point adder. To increase the performance of the addition the adder block is replaced by CSLA. **Keywords** – Floating Point Adder (FPA), Fused Add-Subtractor (FAS), Verilog, Carry Select Adder (CSLA). #### I. INTRODUCTION The increasing need of decimal arithmetic proved an increased attention in many commercial applications and digital systems, where the usage of binary arithmetic is not up to the mark. So many researchers have studied on the floating-point fused multiply add unit and stated that it has several advantages in a floating point arithmetic. The use of fused multiplier-add unit can reduce the delay of an application which executes a multiplication following addition, but the logic may totaly replace a floating point co-processor. Many DSP algorithms have re-designed to utilize the FMA units in a given architectures. For e.g., with the use of FMA unit in radix 16 point FFT algorithm can speeds up the FFT system. High performance and digital filter implementations are possible with the use of Fused Multiply Add unit. Fused Multiply Add units were utilized in embedded signal processing and graphics applications used to perform argument reduction, division and this is why the FMA started to become an important unit of many commercial processors such as Intel, HP and IBM. Similar to operation performed by a FMA in many DSP algorithms and other fields requires both of the sum and difference of a pair of operands for subsequent processing. For example, this can be observed in computation of the DCT and FFT butterfly operations. In conventional floating-point hardware, these operations will be performed serially which limits the throughput of a system. The usage of fused add- subtract unit increases the operation of butterfly architecture. Alternatively the individual operation of add and subtract may be expensive. This paper describes the implementation of fused floating point add-subtract unit. ## II. PROPOSED APPROACH Add-Subtract unit can be realized by two design approaches that can be taken with discrete floating-point adders. Figure 1 says the parallel implementation where two adders operate in parallel such as one for adding and one for subtracting and figure 2 shows the serial implementation where a single adder is used twice once for adding and once for subtracting with the same operands a and b. Figure 1: Add-Subtract Unit with Conventional Parallel implementation Figure 2: Add-Subtract Unit using Conventional serial implementation As shown in figure 1 the conventional parallel implementations of the fused add-subtract unit uses two floating point adders to perform the operation. This method increase the speed of the operation, but the use of two floating point add/sub unit increases the area and power consumption. When compared to figure 2 a conventional serial implementation of the fused add-subtract uses one floating-point adder/subtractor to perform the operation, in addition to this a storage element is used to store the addition or subtraction result based on control. This approach will reduce the area. But, because of the serial execution of both operations the time required to get both results is twice when compared to the parallel approach. And also a storage element is used which slightly increases the area and power consumption. Figure 3: Proposed Floating-Point Fused Add-Subtract Unit The Proposed Fused Floating point Add-Subtract unit replaces the conventional adders with the Carry Select Adder architecture to improve the performance when compare to the conventional system. The CSLA is used to increase the performance of the addition by scheduling the sum outputs by using multiplexors at the output ends. The outputs of the carry select adder depend upon the carry. The design includes the RCA's and BEC-1 architectures. Thus the use of BEC with multiplexor achieves high performance with reduced area. When compared to conventional CSLA, the modified CSLA is better in terms of Area and speed. In modified carry select adder one RCA(Cin=1) is replaced by BEC method. The design of BEC structure consists of AND, XOR and NOT gates. This method replaces the XOR gate by MUX with NOT gate. The Least bit of the input is given to NOT gate and also it is given as control signal to the MUX for the next input value. The Least significant bit and the next immediate bit is provided as input to the AND gate where its corresponding output value is given as a control signal to the Multiplexor. The added value is produced by depending upon the control signal. If the control signal is 0, its output is same as the input otherwise it produces not of its value. This operation is continued till the end of MSB which is the proposed BEC design described in Figure 5. The operation of logic XOR gate is same as that of the logic MUX with NOT gate. The proposed CSLA design minimize the area and power by replacing the gates in BEC structure. The proposed CSLA consumes less area and power compared to the modified CSLA. Figure 4: Modified Carry Select Adder Figure 5: (1) Logic diagram of BEC-1, (2) BEC with 8:4-MUX. # **III. Simulation and Synthesis Results** The proposed Floating point fused add-subtract unit architecture has been implemented and designed through Verilog and XILINX ISE Simulator. The output results shown below in Figure 6 and Figure 7. Here two floating point number has been performed through add and subtract operations in the following figure: Figure 6: Synthesis of fused floating add-subtract unit obtained by XILINX ISE. Figure 7: Fused Floating Add-sub unit Simulation results. The net-list for fused floating point add-subtract unit generated by using the Xilinx XST tool which is shown in Figure 8 Figure 8: Net-list. The proposed Fused Floating Point Add-Sub unit was implemented on Xilinx family Spartan 3E and Device XC3S250E and observed the results by using Chip-scope pro Analyzer, shown in below Figure 9: Figure 9: Chip-scope Result. ## IV. Conclusion The Fused floating-point Add-Sub unit is designed by CSLA. In which CSLA, RCA is replaced by BEC method. This floating point fused add subtract unit using modified CSLA has been implemented by using Verilog and results have been observed by XILINX ISE software tool. And also simulation results are compared with an FPGA Spartan 3E device results through chip-scope pro analyzer. ## V. REFERENCE - [1] Akkas, A.; Schulte, M.J., "A decimal floating-point fused multiply-add unit with a novel decimal leading-zero anticipator," ApplicationSpecific Systems, Architectures and Processors (ASAP), 2011 IEEE International Conference on , vol., no., pp.43,50, 11-14, Sept. 2011 - [2] Samy, R.; Fahmy, H.A.H.; Raafat, R.; Mohamed, A.; ElDeeb, T.; Farouk, Y., "A decimal floating-point fused-multiply-add unit," Circuits and Systems (MWSCAS), 2010 53rd IEEE International Midwest Symposium on , vol., no., pp.529,532, 1-4 Aug. 2010 - [3] Preiss, J.; Boersma, M.; Mueller, S.M., "Advanced Clockgating Schemes for Fused-Multiply-Add-Type Floating-Point Units," Computer Arithmetic, 2009. ARITH 2009. 19th IEEE Symposium on , vol., no., pp.48,56, 8-10 June 2009 - [4] Swartzlander, E.E.; Saleh, H.H., "FFT Implementation with Fused Floating-Point Operations," Computers, IEEE Transactions on , vol.61, no.2, pp.284,288, Feb. 2012 - [5] Takahashi, D., "A radix-16 FFT algorithm suitable for multiply-add instruction based on Goedecker method," Multimedia and Expo, 2003. ICME '03. - Proceedings. 2003 International Conference on , vol.2, no., pp.II,845-8 vol.2, 6-9 July 2003 - [6]. O.J.Bedrij, "Carry-Select Adder", IRE Transactions on Electronic Computers, Pp. 340-344, 1962. - [7]. A.K.W. Yeung and R.K. Yu, "A self-timed multiplier with optimized final adder", Univ. California Berkeley, Final Rep., CS 2921, Fall 1989. - [8]. C.S. Wallace, "A suggestion for a fast multiplier", IEEE Trans. on Computers, Vol.13, Pp, 14-17, 1964. - [9]. J. Skansky, "Conditional-Sum Addition Logic", IRE Trans. On Electronic Computers, EC-9, Pp. 226-231, 1960. - [10]. V.G. Oklobdzija, "High-Speed VLSI Arithmetic Units: Adders and Multipliers", in "Design of High-Performance Microprocessor Circuits", Book edited by A.Chandrakasan, IEEE press, 2000.