# Comparative Study of Homogeneous and Heterogeneous Processor in FPGA For Functional Verification

| S.Karthik         | Dr.S.SaravanaKumar | K.Priyadarsini    |
|-------------------|--------------------|-------------------|
| Research Scholar  | Professor          | Research Scholar  |
| Department of CSE | Department of CSE  | Department of CSE |
| VELS University   | SVEC               | VELS University   |
| Chennai, INDIA    | INDIA              | Chennai, INDIA    |

Abstract- Field programmable gate arrays (FPGAs) provide designers with the ability to quickly create hardware circuits. Increase in FPGA configurable logic capacity and decrease in cost have enabled designers to more readily incorporate *C*. FPGAs in their designs. FPGA vendors have begun providing configurable soft processor cores that can be synthesized onto their FPGA products. While FPGAs with soft processor cores provide designers with increased flexibility, such processors typically have degraded performance and energy consumption compared to hard-core processors. This paper involves a comparative study of homogeneous and heterogeneous processors in FPGA with respect to verification of a design and comparing their results to identify the better one among the two.

Keywords: FPGA, Soft Processor, Homogeneous Processor, Heterogeneous Processor.

## Introduction

#### A. FPGA

The design size and complexity increases according to Moore's law and verification time also scales up to 6 months to 1 year. The use of hardware like FPGA with embedded processor has emerged in assisting simulation. This has a given a new way to verification trends. The use of Logic simulators in predicting the behavior of digital circuits has a major bottleneck as the level of effort required to debug and then verify the design is proportional to the complexity of the design.

## B. Verification Trend

Functional verification is the key in reducing development and production cost of an IC. Functional verification of a design is done either by FPGA prototyping or Logic Simulation. The problem with logic simulation is that the simulation time is more for a large design and very slow in running application software against the hardware design. In other hand FPGA prototyping are quick and low cost but only limited number of signals are visible to the user. This is changing with emerging FPGA Technology prototype tools that provide full visibility to 10,000s of internal signals. Few recent works have demonstrated the effectiveness of using GPU to speed up gate level simulation but failed in communication and load balancing. With advancement of FPGA architectures like ZYNQ-7000 SOC architecture,

speeding up of simulation is possible by creating a heterogeneous or homogeneous core

#### C. Embedded Processor

Embedding a processor inside an FPGA has much advantage like peripherals can be chosen based on application and large banks of external memory can be connected to the FPGA and accessed by the embedded processor system using included memory controllers. One of the most exciting developments in FPGA that has emerged in current years is the emergence of hard and soft FPGA-embedded processors. These processors include Xilinx MicroBlaze<sup>TM</sup>, IBM PowerPC<sup>TM</sup>440, Altera® Nios<sup>TM</sup> II, and others. In this paper hard core refers to a system that cannot be reconfigured and all of the components are already fixed by the manufacturer and integrated into a development board. By contrast soft core will refer to a system on a programmable chip (SOPC). In an SOPC, the processor, memories and components are created by using the available resources in a programmable logic device and it can be customized according to a particular set of specifications.

#### D. Advantages Of Embedded Processor

The advantages of an embedded processor are Hardware acceleration, Peripheral Customization, Component obsolescence Mitigation.

## E. Scope Of The Work

This work mainly focuses on the development of Heterogeneous and Homogeneous architectures by creating Hardcore and soft cores on the XILINX Zynq-7000 FPGA and verifying the functionality and determining which one of the simulations is faster. Homogeneous architecture involves the interfacing of a soft core processor with another soft core processor, whereas Heterogeneous architecture involves interfacing an hardcore processor with an soft core processor. We use Xilinx Vivado High level synthesis tool which creates an RTL implementation from C level source code.

## **Related Work**

For several years the greater part of the verification attempt in industry has revolved around logic simulators.

Initial work addresses several key aspects that are still utilized by modern solutions, including distributed parallel simulation, termed as Parallel Discrete Event-driven Simulation (PDES) [13], partitions the design into two or more partitions and assigns each partition to a different processing unit, called Logical Processor (LP). Simulation correctness is ensured by preserving event ordering among LPs, which makes synchronization among LPs important. Factors that affect the performance of such distributed parallel simulation are: interconnect between LPs, loadbalancing among LPs, and synchronization overhead. PDES has not been very successful as it failed to achieve meaningful performance improvement of HDL simulation, caused by issues such as design partitioning, synchronization, inter-module communication; MULTES [11] offers an interesting alternative technique for parallel simulation. There are similarities and fundamental differences between MULTES and PDES [7] [11] [12] for HDL simulation. MULTES divides the simulation time into multiple time slices for each time slice to be simulated independently. PDES on the other hand uses spatial partitioning to divide the design into multiple partitions which are simulated independently. Both MULTES and PDES use model at higher level of abstraction for reference simulation. For example, both MULTES and PDES use RTL for parallel functional (zero-delay) gate-level simulation.

## A. XILINX ZYNQ 7000

Xilinx Zynq-7000 [2] All Programmable SoCs are the fastest, smartest way to create smarter systems. These devices fuse a fast processor system based on two 1GHz ARM® CortexTM-A9 MPCore processors with the industry's fastest and most advanced 28nm FPGA fabric, multiple high-speed serial transceivers, and an on-chip analog-processing block that incorporates two 1Msamples/sec A/D converters. The Zyng-7000 family offers the flexibility and scalability of an FPGA, while providing performance, power, and ease of use typically associated with ASIC and ASSPs. The range of devices in the Zyng-7000 all Programmable SoC family allows designers to target cost-sensitive as well as highperformance applications from a single platform using industry-standard tools. While each device in the Zynq-7000 family contains the same PS, the PL and I/O resources vary between the devices. As a result, the Zynq-7000 All Programmable SoCs are able to serve a wide range of applications including: Automotive driver assistance, driver information, and infotainment, Broadcast camera, Industrial motor control, industrial networking, and machine vision, IP and Smart camera, LTE radio and baseband, Medical diagnostics and imaging, Multifunction printers, Video and night vision equipment, The Zynq-7000 architecture enables implementation of custom logic in the PL and custom software in the PS. It allows for the realization of unique and

differentiated system functions. The integration of the PS with the PL allows levels of performance that two-chip solutions (e.g., an ASSP with an FPGA) cannot match due to their limited I/O bandwidth, latency, and power budgets. Xilinx offers a large number of soft IP for the Zynq-7000 family. Stand-alone and Linux device drivers are available for the peripherals in the PS and the PL. The Vivado® Design Suite development environment enables a rapid product development for software, hardware, and systems engineers. Adoption of the ARM-based PS also brings a broad range of third-party tools and IP providers in combination with Xilinx's existing PL ecosystem. The inclusion of an application processor enables high level operating system support, e.g., Linux. Other standard operating systems used with the Cortex-A9 processor are also available for the Zynq-7000 family. The PS and the PL are on separate power domains, enabling the user of these devices to power down the PL for power management if required. The processors in the PS always boot first, allowing a software centric approach for PL configuration. PL configuration is managed by software running on the CPU, so it boots similar to an ASSP.



Fig. 1. XILINX ZYNQ 7000 architecture

#### B. XILINX VIVADO

The Vivado Design Suite delivers a SoC-strength, IP-centric and system-centric, next generation development environment that has been built from the ground up to address the productivity bottlenecks in system-level integration and implementation. The Vivado Design suite is a Generation Ahead in overall productivity, ease-of-use, and system level integration capabilities. Vivado HLS speed up IP creation by enabling C, C++ and System C specifications to be directly targeted into Xilinx FPGA without the need to physically create RTL. The main advantage of this tool in our work is accelerated verification using C/C++ test bench simulation, automatic VHDL or Verilog simulation and test bench generation.



Fig. 2. XILINX Vivado Flow

## C. FUNCTIONAL VERIFICATION APPROACH

In this work we explore parallel simulation of a design based on functional partitioning and running each partition or module on a separate core created. We need to create a heterogeneous core soft core and hard core processor in the XILINX ZYNQ-7000 FPGA and functionally verify all the modules. So First we created a heterogeneous multi-core system consisting of the ARM Cortex A9 processor on the processing system and a Microblaze processor on the programmable logic using Vivado. The application program was mapped and run on each core. Same procedure was repeated for homogeneous core. Finally we verified the functionality by generating the test bench and applying it to the DUT (Design Under Test). Simulation time was noted down from the synthesis report.



Fig. 3a.

Fig. 3b.

(Fig. 3a&3b). Heterogeneous and Homogeneous Architecture



Fig. 4. Hardware Generated By Xilinx Vivado Tool

## D. VERIFICATION AND VALIDATION

Case 1: An example of a simple ALU design was taken. The ALU was partitioned into different module like multiplier module and adder module etc. Then these two modules were run on the heterogeneous architecture formed by interfacing hard core and soft core. The two modules are implemented separately on to each of the cores and the simulation time is recorded after the synthesis report.

Case 2: The ALU was partitioned into different module like multiplier module and adder module etc. Then these two modules were run on the homogeneous architecture formed by interfacing soft core and soft core. The two modules are implemented separately on to each of the cores and the simulation time is recorded after the synthesis report. Can achieve much faster processing speeds since they are optimized and not limited by fabric speed

| Туре          | No of<br>Partition | Simulation<br>Time (ns) |
|---------------|--------------------|-------------------------|
| Heterogeneous | 2                  | 11                      |
| (Hard+Soft)   |                    |                         |
| Homogeneous   | 2                  | 63                      |
| (Soft+Soft)   |                    |                         |

Table1:Simulation Time



Fig. 5. Comparison & Result

## **Conclusion**

In these experiment we applied our parallel simulation by partitioning the design and running it on the CPU cores created and simulation timing results were obtained. The simulation results show that the heterogeneous processor (hard core +soft core) is 5.8 times faster and uses 57% of less power when compared to the homogeneous processor. This is due to fact that ARM Cortex A9 has improved pipelining compared to Microblaze and speeds of the soft core processor are limited by fabric.

## References

- [1] K.H. Chang, and C. Browy, "Parallel Logic Simulation: Myth or Reality?." IEEE Computer Society, April 2012.
- [2] Xilinx (<a href="http://www.xilinx.com">http://www.xilinx.com</a>)
- [3] Synopsys (http://www.synopsys.com)
- [4] R. Wiśniewski, A. Bukowiec, and M. Węgrzyn, "Benefits of Hardware Accelerated Simulation," DESDes' 2001.
- [5] W.K. Lam, "Hardware Design Verification: Simulation and Formal Method-Based Approaches," Prentice Hall, 2005.
- [6] R.M. Fujimoto, "Parallel Discrete Event Simulation, "Communication of the ACM, Vol. 33, No. 10, pp. 30-53,Oct. 1990.
- [7] D. Kim, M. Ciesielski, and S. Yang, "MULTES: Multi-Level Temporal-parallel Event-driven Simulation," IEEETrans. on CAD of Integrated Circuits and Systems 32(6):pp. 845-857 (2013).
- [8] Bailey, M.L., Briner, J.V. Jr., Chamberlain, R.D.: Parallel logic simulation of VLSI systems. ACM Comput. Surv. **26**(3), 255–294 (1994)
- [9] Bergeron,J.:WritingTestbenches—Functional Verification of HDL Models. Springer, Berlin (2003)
- [10] W. Chen, X. Han, C. Chang, and R. Dömer. "Advances in Parallel Discrete Event Simulation for Electronic System-Level Design." (2011): 1-1.
- [11] Kim, Dusung. MULTES: Multi-level Temporal-parallel Event-driven Simulation. PhD thesis, University of Massachusetts Amherst, 2012.
- [12] Kim, Dusung, Ciesielski, Maciej J., Shim, Kyuho, and Yang, Seiyang. Temporal parallel simulation: A fast gate-level hdl simulation using higher level models.In *DATE* (2011), pp. 1584–1589.
- [13] Kim, Dusung, Ciesielski, Maciej J., and Yang, Seiyang. A new distributed event-driven gate-level HDL simulation by accurate prediction. In *DATE* (2011),pp. 547–550.
- [14] Lam, William K. Hardware Design Verification: Simulation and Formal Method-Based Approaches. Prentice Hall, 2005

# **Biography**



**Karthik Sekhar** was born in Chennai, in 1981. He received the B.E degree from University of Madras, Chennai, in 2003 and M.Tech degree from Sathyabama University, Chennai in 2005 and also working towards the PhD. Degree in the area of parallel stimulation algorithm. He is

having more than 9 years of teaching experience and published more than 10 international journals



Dr.S.SaravanaKumar has more than 14 years of teaching and research experience. He did his Postgraduate in ME in Computer Science Engineering at Bharath engineering college, chennai, and Ph.D in Computer Science and Engineering at Bharath University, Chennai. He occupied various positions as Lecturer, Senior Lecturer, Assistant Professor, Associate Professor and Professor & HOD. He has published more than 150 research papers in High Impact factor International Journal, National and International conferences and visited many countries like Taiwan, Bangok and Singapore. He has guiding a number of research scholars in the area Adhoc Network, ANN, Security in Sensor Networks, Mobile Database, Cloud Computing and Data Mining under University, Sathayabama Bharath University, Vels University Chennai.



**K. Priyadarsini** was born in Chennai in 1984. She received her B.E. degree in Computer Science from Anna University, Chennai in 2006, M.Tech in Computer Science at VIT University, Vellore in 2011 and also working towards the PhD. Degree in Cloud computing. She is having more than 3 years of experience in teaching and 3 years of experience in industry and she has published more than 5 international journals.