# VLSI Implementation of Timing and Control Unit (TCU) for Memory Processor ALU System

Fazal Noorbasha

VLSI Research Group, Department of Electronics and Communication Engineering, KL University, Vaddeswaram, Guntur (A.P.), India - 522 502

E-mail: fazalnoorbasha@kluniversity.in

#### ABSTRACT

This paper describes the analysis and modeling of timing and control unit (TCU) for memory processor ALU system, to improve the microprocessor ALU arithmetic and logic operations performance. The system blocks and the behavior of all the blocks are defined and the logical design is implemented in gate level in the design phase. Then the logical circuits are simulated and all of the subunits are converted into FPGA and VLSI CMOS layout. The TCU and Data Stack Swap (DSS) have been integrated in 0.12µm, 90nm CMOS technology. The CMOS logic design is preferred for implement low leakage and high-speed model. In this paper, the functioning of TCU, DSS operations with ALU, the design steps and the obtained results are explained. The main achievement is the implementation is single operations per a single clock cycle as well as double operations per a single clock cycle can perform with respect to select the TCU mode.

KEYWORDS: FPGA, TCU, DSS, ALU, SOR/DOR, CMOS

### **1. INTRODUCTION**

Initial designs and methods were based on the CAD available at that time. The circuits were specified and synthesized using speed-independent (SI) or burst-mode (BM/XBM) methodologies, as well as metric timed circuit design [1, 2, 3]. To improve the overall system performance, the communication speed between chips in a system must increase accordingly. For a given data rate, multilevel signaling can be used to reduce the channel symbol rate, intersymbol interference (ISI), and crosstalk [4]. Field-programmable gate arrays (FPGAs) are used in a variety of markets that have differing cost, performance and

power consumption requirements. While it would be ideal to serve all these markets with a single FPGA family, the diversity in the needs of these markets means that generally more than one family is appropriate. Consequently, FPGA vendors have moved to provide a diverse set of families that sit at different points in the area speedpower design space. In this present system to improve the speed of the data rate we have developed an advanced Timing and Control Unit (TCU). It is the unit that controls all the data operations with respect to clock count that are performed by the system synchronously. This TCU is modeled with counters and decoders. This TCU controls the all the data operations between the I/O data, data stack swap (DSS), memory, processor ALU, MODE and FLAGs in single data rate (SDR) or double data rate (DDR) as well as single operation rate (SOR) or double operation rate (DOR) modes. This module is foremost for this entire system operation. These entire circuit modules are designed in FPGA using Xilinx, CMOS Layout using and gate level and switch level circuits are implemented in DSCH3 CAD tools. And simulation results are tested in ModelSim. After fabrication of entire chip layout, entire chip is exposed to an ultra violet light source which emits photons with 4.9eV energy (250nm wavelength). The photons are absorbed by the electrons of the floating gate, excited and finally attracted by the control

gate or substrate. Typical erasure time is 20ns. The supply voltage is 1.20V, I/O supply voltage is 2.50V and operating temperature is 27.00  $^{\circ}$ C.

## 2. ANALYSIS AND SYNTHESIS OF TCU

#### A. Gate-Level Logic Circuit Design

Now as the conventional methods of power reduction reach their limits nonconventional methods like adiabatic logic promise greater power reduction capabilities. The adiabatic logic also known energy recovery logic works by restricting the current to flow across the devices with low voltage drop and recycling the energy stored on their capacitors [5, 6]. We have developed and designed Gate-level logic for the TCU. To design TCU we have used mainly the logic gates, multiplexers and flip-flop circuits. Nearly 105 circuit symbols are taken and 330 routing lines are formed. It ran at the clock speed up to 200MHz at room temperature. For SDR, requires  $I_{max}$  of 25mA,  $I_{Avg}$  of 0.023mA,  $P_{ow}$  of 0.057mW and supply voltage is 2.5V. For DDR, requires  $I_{max}$  of 17mA,  $I_{Avg}$  of 0.025mA,  $P_{ow}$  of 0.063mW and supply voltage is 2.5V. The length / width of Nmos is 0.12  $\mu$ m / 1.0  $\mu$ m and for Pmos is 0.12  $\mu$ m / 2.0  $\mu$ m. The gate delay is 0.100ns and wire delay is 0.200ns. Fig.1 shows the Gate-Level logic circuit of timing and control (T&C) unit of processor ALU.



Fig. 1 Gate-Level Modeling of Timing and Control Unit of Processor ALU

#### B. FPGA Logic Implementation

Field Programmable Gate Arrays (FPGAs) are becoming a critical part of every system design [7]. For TCU design we have used Xilinx (Spartan-3) family. We have developed total hardware on VHDL code. Here Fig. 2 and Fig.3 (a), (b), (c) are shows the RTL (FPGA) schematic view of TCU. The SOR and DOR FPGA device utilization is used as Slices are 18, slice Flip Flops are 32, 4 input LUTs are 17, bounded IOBs are 22 and Global Clocks (GCLKs) are 1. The average connection delay for SOR FPGA design is 0.905ns, the maximum pin delay is 2.306ns and the average connection delay on the 10 worst NETS is 1.331ns. The average connection delay for DOR FPGA design is 0.829ns, the maximum pin delay is 2.017ns and the average connection delay on the 10

worst NETS is 1.117ns. The pin delays less than 1.00ns have 94 pins, less than 2.00ns have 37 and less than 3.00ns have 1.



Fig. 2 RTL (FPGA) view of Timing and Control of Processor ALU in SOR Mode



Fig. 3 RTL (FPGA) view of Timing and Control of Processor ALU in DOR Mode

#### C. CMOS Logic Circuit Layout Design

Layout is the one of the main process before IC fabrication process, for this we used 0.12µm CMOS technology. We preferred this because of the low leakage and high-speed model. Here the TCU (timing and control unit) CMOS circuits are fabricated in 90nm technology. The width of layout is 60.1µm, height is 8.6µm and total surf is  $15.8\mu\text{m}^2$ . This TUC layout design has 84 Pmos and Nmos transistors and has 91 electric nodes. The length (L)/ width (W) of Nmos is 0.120µm / 0.240µm and for Pmos is 0.12µm / 0.720µm. The rise delay is 0.002ns and fall delay is 0.001ns. The parasitic node properties are observed, capacitance is 0.62fF, resistance is 176 ohm and inductance is 0.001nH. We have observed the voltage and current parametric analysis with respect to time (ns). The I<sub>ddMax</sub> is 1.892mA, I<sub>ddAvr</sub> is 0.090mA.

# 3. OPERATION AND SUIMULATION RESULTS OF TCU

This TCU is the main block for this 8-bit ALU VLSI SOC system. It contains Clock (200MHz), enable, ORM (Operation Rate Mode), reset, counter, stack, swap and nineteen 8-bit ALU operations like – buffer, addition, subtraction, multiplication, increment by one, decrement by one, addition with C-in, subtraction with C-in, 1s complement, 2s complement, AND, NAND, OR, NOR, Ex-OR, Ex-NOR, rotate right by one bit, shift right by one 8<sup>th</sup> bit (MSB) replace with C-in bit and comparator.



Fig. 4 State Machine Diagram of DSS -ALU

The 8-bit data for the ALU operations are load through the DSS circuit. This system can operate in two modes, single operation rate (SOR) mode per clock cycle and double operation rate (DOR) mode per clock cycle. If the ORM is set as logic '0' then it is in single operation rate mode and if ORM is set as logic '1' then it is in double operation rate mode. Before going to enable this system we have to initialize the system. First reset the system, then after enable and according to our data operation rate set the ORM mode either logic '0' or logic '1'. The Stack contains four 8-bit data store registers namely 'a', 'b', 'c' and 'd' which are designed by D type flip flops. This Stack acts like a small size data store memory. If the stack is set as logic '0' then the 8-bit data 'a' and 'b' are out from the stack memory, if stack is set as logic '1' then the 'c' and 'd' are out from the stack memory. This stack will change its state for every 32 clock cycles for single data rate (SDR) when ORM is set as logic-0 and the stack will change its state for every 16 clock cycles for double data rate (DDR) when ORM is set as logic-1. This out put data from the stack is again feed to the Swap, which has a two 8-bit data store registers, is also made by using D type flip-flops. If swap is set as logic-0 then the 8-bit data 'a' or 'c' is load in the ALU register 'X' and 'b' or'd' is load in the ALU register 'Y'. If swap is set as logic '1' then the 8-bit data 'b' or 'd' is sent to the ALU 8-bit data input register 'X' and 'a' or 'c' is sent to the ALU 8-bit data input register 'Y'. This swap will change its state for every 16 clock cycles for single data rate (SDR) when ORM is set as logic-0 and the swap will change its state for every 8 clock cycles for double data rate (DDR) when ORM is set as logic-1. Then after completion of all this process the ALU will perform the nineteen arithmetic and logic operations [8]. The ALU results are stored in 8x16 SRAM memories. The results are decoded by using MAR operation control register. The state machine diagram of DSS -ALU are shown in Fig. 4

and operation and simulation results are shown in Fig. 5. Here we have taken stack data as a=0000 1001, b=0101 0101, c=0000 0110 and d=1010 1010.



Fig. 5 Timing Diagram of Stack-Swap circuit of Memory-Processor ALU in Dual-Mode

In this first we have to initialize the system, and then according to our mode of data or operation rate we have to set the ORM. If ORM is set as logic 0 then ALU will perform SDR/SOR with respect every single count and if ORM is set as logic 1 then ALU will perform DDR/DOR with respect every single count [9]. The TCU counter timing diagram of processor ALU in SDR/SOR mode is shown in Fig. 6 and TCU counter timing diagram of processor ALU in DDR/DOR mode is shown in Fig. 7.



Fig. 6 TCU Timing Diagram of Processor ALU in SDR/SOR



Fig. 7 TCU Counters Timing Diagram of Processor ALU in DDR/DOR

# 4. CONCLUSION

An advanced Timing and Control Unit (TCU) for memory processor ALU system's, gate-level logic, FPGA and CMOS integrated circuits has been designed. TCU and DSS have been fabricated using 0.12µm CMOS technology. We also obtained a successful device CMOS layout parasitic parameter, time delay and temperature analysis. This advanced system can control the Processor ALU operations in SDR/SOR as well as DDR/DOR modes. By interface this TCU and DSS VLSI circuit with microprocessor ALU, the operation modes can change with the same 8-bit data automatically, which will speed up the output results.

#### REFERENCES

- [1] Kenneth S. Stevens, Ran Ginosar and Shai Rotem, "Relative Timing", IEEE Transactions On Very Large Scale Integration (VLSI) Systems, Vol. 11, No. 1, February-2003, pp. 129-140.
- [2] Rajeev Krishna, Scott Mahlke, and Todd Austin, "Memory System Design Space Exploration for LowPower, RealTime Speech Recognition", *ISSSCODES' 04*, Sept. 8–10, 2004, Stockholm, Sweden Copyright 2004 ACM 1581139373/04/0009.
- [3] Talukder, H. M. Gholap, A. V. and Kanyemba, S., "Design And Construction Of A Water Heater Controller", African Journal of Science and Technology (AJST) Science and Engineering Series Vol. 5, No. 1, pp. 1 – 5.
- [4] Kamran Farzan and David A. Johns "A CMOS 10-Gb/s Power-Efficient 4-PAM Transmitter", IEEE Journal Of Solid-State Circuits, Vol. 39, No. 3, March-2004, PP. 529-532.
- [5] G. Josemin Bala, J. Raja Poul Perinba, "Adiabatic Memories-A Review", Electronic Technology Internet Journal, 37/38,(2005/2006),2, PP. 1-4.
- [6] Anissa Djellid-Ouar, Guy Cathebras and Frédéric Bancel, "Supply voltage glitches effects on CMOS circuits", 0-7803-9727-4/06 © 2006 IEEE.
- [7] Fazal Noorbasha, Ashish Verma, A. M. Mahajan, "Study the Analysis of Low power and High speed CMOS Logic Circuits in 90nm Technology", e-Journal of Science & Technology, Jan-2010, Vol. 5, Issue-1, PP. 43-50.

- [8] Fazal Noorbasha, Ashish Verma, "A VLSI SoC Layout Process for Static Combinational Logic Circuits in 90nmTechnology", Journal of Madhya Bharathi, Vol. 55, (2009), PP. 87-94.
- [9] Fazal Noorbasha, Ashish Verma, Alka Dubey, "Arithmetic and Logic Operation Control (ALOC) System-On-Chip (SoC) Device for Microprocessor", International Journal of Research Hunt, Nov.-Dec. 2008, Vol.III, Issue-VI, PP. 1-6.



**Dr. Fazal Noorbasha** was born on 29<sup>th</sup> April 1982. He received his, B.Sc. Degree in Electronics Sciences from BCAS College, Bapatla, Guntur, A.P., Affiliated to the Acharya Nagarjuna University, Guntur, Andhra Pradesh, India, in 2003, M.Sc. Degree in Electronics Sciences from the Dr. HariSingh Gour University, Sagar, Madhya Pradesh, India, in 2006, M.Tech. Degree in VLSI Technology, from the North Maharashtra University, Jalgaon, Maharashtra, INDIA in

2008, and Ph.D. Degree in VLSI from Department Of Physics and Electronics, Dr. HariSingh Gour Central University, Sagar, Madhya Pradesh, India, in 2011. Presently he is working as a Assistant Professor, Department of Electronics and Communication Engineering, KL University, Guntur, Andhra Pradesh, India, where he has been engaged in teaching, research and development of Low-power, High-speed CMOS VLSI SoC, Memory Processors LSI's, Fault Testing in VLSI, Embedded Systems and Nanotechnology.

He is a Scientific and Technical Committee & Editorial Review Board Member in Engineering and Applied Sciences of World Academy of Science Engineering and Technology (WASET), Advisory Board Member of International Journal of Advances Engineering & Technology (IJAET), Life Member of Indian Society for Technical Education (ISTE), Member of International Association of Engineers (IAENG) and Senior Member of International Association of Computer Science and Information Technology (IACSIT). He has published over 25 Science and Technical papers in various International and National reputed journals and conferences.