MetkuPrashanthi1
KimKyung Ki2
KimYong-Bin3
ChoiMinsu4
-
(1GlobalFoundries, Essex Junction, VT, USA)
-
(Department of Electronic Engineering, Daegu University, Gyeongsan, Korea)
-
(Department of Electrical and Computer Engineering, Northeastern University, Boston,
MA, USA)
-
(Department of Electrical & Computer Engineering, Missouri University of Science &
Technology, Rolla, MO, USA)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Null-convention logic, gate diffusion input, clockless design, transistor count reduction, simulation
I. INTRODUCTION
Clocking have become a very complex task for circuits due to technology scaling. The
increasing clock rate, due to the de- creasing transistor size is leading to a major
problem of clock skew. In fact, designing clock nets consumes large portion of the
designing time (1). In order to achieve a tolerable skew, large part of the chip area is allotted for
clock drives (2). This leads to high power dissipation prominently at clock edges, where switching
occurs. As the trend for high clock frequency and decreasing the feature size continues,
synchronous circuits power dissipation and noise are significantly increasing (3). The increasing power dissipation is the major concern for the emerging low power
industry. Thus, encouraging renewed interest towards asynchro-nous digital designs.
In comparison to synchronous circuits delay-insensitive (DI) asynchronous paradigms
offers less power, noise and electro- magnetic interference (4). Asynchronous circuits are classified into two types: bounded-delay and delay-insensitive
models. Bounded-delay models consider both the gates and wire delays to be bounded.
One such example for this type of model is micropipelines (5). Here, delays are added based on the worst-case scenarios. To ensure the correctness
of the circuits, extensive timings analysis of worse-case behavior is considered.
On the contrary, delay-insensitive models assume both the gate and wire delays are
unbounded. Here, wire forks are considered to be isochronic, that is, the component
wire delays are much less than the logic element delays (6). This assumption is even valid for the future nanotechnologies. However, wire connecting
the components doesn’t abide to the isochronic assumption.
One of the most used techniques for delay- insensitive asynchronous logic design is
the Null Convention logic (NCL) (7). NCL utilizes dual-rail or quad-rail encoding to represent logic 1, logic 0, null
and invalid signals. For clock free operation, NCL, uses local handshaking done by
the completion detection register (8). Usually, NCL circuits are realized in CMOS technology which has the potential for
high speed but has high power dissipation and occupies a large area. In order to reduce
the area, semi-static implementation of NCL circuits have also been proposed (4). However, the semi-static implementation has the limitation of weak feedback loop.
To overcome the above limitations, a novel approach leveraging Gate Diffusion Input
(GDI) method is proposed (9). GDI is a low power design technique that was first introduced in synchronous circuits
to obtain low power synchronous designs (10). A wide range of complex logic functions can be implemented in only two transistors
by using the GDI approach. This approach is suitable for designing low power circuits
with the reduced transistor count. The proposed approach is extensively verified by
design and simulation of multiple prototype arithmetic logic circuits in this work.
This paper is organized as follows. Section II presents the Preliminaries and review
of NCL and GDI. An extensive discussion of the proposed design is carried out in Section
III. Design and performance evaluation data including the area, power and latency
are included in Section IV. Finally, the summary and concluding remarks are made in
Section V.
II. PRELIMINARIES AND REVIEW
In the current nanometer technology with ultra-low power de- sign as a goal, synchronous
circuit designs are limited because of their high-power dissipation factor. Asynchronous
circuits such as Null Convention Logic are the promising alternative to this solution.
NCL gates also known as threshold gates are designed with a hysteresis loop to main
delay insensitivity (11). Several CMOS implementations of NCL gates have been proposed and each design has
its own limitation. One of the most common limitations of a using CMOS implementation
is the area consumption. To overcome this and to reduce power dissipation, the low
power design technique GDI approach is implemented in some NCL gates. This section
gives the brief idea about the NCL design and GDI approach.
1. Null Convention Logic
NCL is a popular delay-insensitive methodology used for designing asynchronous circuits.
NCL circuits are said to perform correctly regardless of when the input becomes available.
Hence, resulting in a clock-less and DI circuit design (3). It is a self-timed logic paradigm where both data and control are integrated into
a single signal. To achieve the delay- insensitivity, NCL circuits utilize dual-rail
or quad-rail logic (12). Dual rail logic consists of two wires D0 and D1, whose values can be any one from
the set DATA0, DATA1, NULL. The DATA 0 (D0 = 1, D1 = 0) stage represent Boolean logic
0, DATA1 state (D0 = 0, D1 = 1) is equivalent to Boolean logic 1 and NULL (empty stage)
stage (D0 = 0, D1 = 0), meaning no DATA is available at the input. When D0 = 1 and
D1 = 1, this corresponds to invalid stage (2). Both the rails are mutually exclusive to each other, such that no two rails can
be simultaneously asserted. Similarly, quad-rail has four wires Q0, Q1, Q2 and Q3,
each representing different stage from the set DATA0, DATA1, DATA2, DATA3, NULL. These
rails are also mutually exclusive to each other. To achieve the delay-insensitive
behavior NCL should possess two main characteristics: symbolic completeness and input
completeness (2).
NCL circuits are implemented using threshold gates. The basic NCL gate is T Hmn where
1 ≤ m ≤ n (13). Here, n and m represent total number of inputs and the number of inputs to be asserted,
respectively. At least m out of n inputs should be asserted before the output is asserted
(12). Second type of NCL gates are weighted threshold gate.
These gates are denoted as T HmnWw1w2wR where, w1, w2, ....wR, each > 1, are the integer
weights of input1, input2,..... input R, respectively. Here, m ≥ wR > 1, applied to
input R but 1 ≤ R < n. There are 27 fundamental NCL gates constituting from two to
four variable functions. In order to design the DI circuits, NCL has a built-in hysteresis
state-holding capacity. This implies that after the output is asserted; all the inputs
must be de-asserted for the output to be de-asserted. This Hysteresis ensures the
gate is input complete, meaning that the output remains constant until all the inputs
are de-asserted (2).
2. Gate Diffusion Input (GDI) Approach
For simple implementation of the GDI gates (all functions) in standard CMOS processes,
a new modified GDI model was introduced in (14). Fig. 1 illustrate modified GDI basic cell (14). Table 1 shows the input configuration of the simple GDI cell corresponding to different Boolean
functions. Similar to the conventional GDI it has three inputs G (common gate input
of both the nMOS and the pMOS), P (input to the source/drain of the pMOS), N (input
to the source/drain of the nMOS). The bulks of nMOS and pMOS transistors in the modified
cell are constantly connected to GND and VDD, respectively. This adaptation enables
simple implementation of the GDI gates (all functions) in standard CMOS processes
(10). The influence of the bulk effect on the circuit performance is very similar to that
of the originally proposed GDI cell. With the technology scaling, the impact of source-to-bulk
voltage on the transistor threshold voltage is highly reduced making this limitation
less relevant in process below 65 nm technology. The following equation shows the
dependency of transistor threshold voltage on the source to bulk voltage (14):
where VSB refers to the source to body voltage, $V_{th0}$ is the threshold voltage
when VSB = 0, φ1F represents the Fermi potential, γ denotes the linearized body coefficient,
and η represents the Drain-induced barrier lowering (DIBL) coefficient.
Variety of function as seen in Table 1 can also be implemented using Modified GDI cell (15). GDI gates are more versatile and compact than Static CMOS gates. For example, designing
Multiplexer (MUX) using Modified GDI requires only two transistors whereas CMOS design
requires 12 transistors. GDI approach is more effective for the AND, OR, F1 and F2
functions. The F1 and F2 functions are the two basic functions in GDI and each one
of these functions provides a universal set (15). Therefore, in general, every digital circuit can be implemented using only F1 or
F2 gates or a combination of both. Simple modification in the input signals of F1
and F2 gates provides different functions, thus allowing synthesizing of other functions
more efficiently (14). Although MGDI reduces the transistor count, they suffer a voltage drop at their
outputs causing performance degradation.
Fig. 1. Block diagram of the proposed transmitter.
Table 1. Boolean function synthesis through input configuration of a simple GDI cell
(15)
N
|
P
|
G
|
Out
|
Function
|
0
|
B
|
A
|
$\overline{A}B$
|
F 1
|
B
|
1
|
A
|
$\overline{A} + B$
|
F 2
|
1
|
B
|
A
|
A + B
|
OR
|
B
|
0
|
A
|
AB
|
AND
|
C
|
B
|
A
|
$\overline{A}B + AC$
|
MUX
|
0
|
1
|
A
|
$\overline{A}$
|
NOT
|
III. GATE DIFFUSION INPUT (GDI) BASED NCL CIRCUITS
With the decreasing feature size, requirement of designs with not only reduced area
but also power dissipation is required. Several CMOS implementation schemes have been
introduced for NCL gates, including dynamic, static, semi-static (4), and differential. The static and semi-static implementations of C-elements have
been extensively discussed in (6). The main drawback of the CMOS NCL design is it occupies a large area, thus, large
power dissipation. To address this limitation modules of the NCL design are implemented
using the GDI technique. The GDI technique is a low power designed approach where
a wide range of complex circuits can be implemented using only two transistors. of
C-elements have been extensively discussed in (6). The main drawback of the CMOS NCL design is it occupies a large area, thus, large
power dissipation. To address this limitation modules of the NCL design are implemented
using the GDI technique. The GDI technique is a low power designed approach where
a wide range of complex circuits can be implemented using only two transistors. Hence,
the GDI approach not only reduces the power dissipation but also reduces the transistor
count. The GDI implementation of NCL gates has been proposed and extensively discussed
in here.
1. Static CMOS Implementation of NCL Gates
Generally, CMOS based designs consists of one pull-up and one pull-down network to
implement the set and reset functions, which are complements of each other (16). However, the NCL threshold gates are also designed with the hysteresis state holding
capability to ensure delay-insensitivity (12). As the result, an additional pull up and pull-down network known as Hold0 and Hold1
are required to maintain this hysteresis such that the output will not change until
all inputs are de-asserted (2). An NCL gate constitutes of both set and hold equation, the gate functionality and
when should it be asserted is determined by the set and the hold determines till when
the gate should be asserted which is nothing but the OR-ing of all the gate inputs
(12).
2. GDI Implementation of NCL Gates
To overcome the above limitations, a low power design technique, Modified Gate Diffusion
Input (GDI) can be utilized to design the NCL circuits. The basic representation of
GDI cell is shown in Fig. 4 where inputs can also be applied to source of both NMOS and PMOS, allowing to design
a wide range of circuits using only two transistors (15). However, using GDI full output voltage swing cannot be obtained for all input combinations,
thus, leading to a significant voltage drop at the final output [since, PMOS transistor
is strong pull up device and NMOS transistor has strong pull-down network]. This limitation
can be addressed by using regenerative buffers (15). Thus, implementation of the NCL circuits using GDI technology not only reduces the
transistor count but also reduces power dissipation (15).
Fig. 2. Basic GDI Implementation of TH22 gate.
1) Designing of the GDI NCL gates: Table 1 shows the different input configuration corresponding to respective Boolean functions
(15). These configurations are used for designing GDI based NCL gates. The NCL gates constitutes
of both set and hold equation, the gate functionality and when should it be asserted
is determined by the set and the hold determines till when the gate should be asserted
which is nothing but the OR-ing of all the gate inputs. The complete Boolean equation
for a THmn gate is breakdown into a series of AND, OR and MUX functionality and then
GDI AND, OR and MUX configurability is used for designing NCL TH gates. The basic
GDI implementation of NCL TH22 gate is depicted in the Fig. 2. The Boolean expression of T H22 gate is as [(AB + Z(A + B)]. Accordingly, the GDI
AND and OR configurability is used for designing AB and (A + B) respectively. Finally,
GDI MUX configurability is used to determine the set or hold state based on the previous
results (i.e. based on Z). In comparison to the CMOS implementation, GDI based TH22
gates requires only 6 transistors. Thus, reducing the transistor count by 50%. However,
voltage drop at the output effects the performance of the GI NCL gates.
2) Analysis of voltage swing for the GDI NCL gates: The major drawback of the above
method is that the full output voltage swing cannot be obtained for all input combinations
(leading to a significant voltage drop at the output). This limitation arises due
to the structure of the inputs applied to the GDI cell. As the pMOS and nMOS transistor
are strong pull up device and strong pull-down network respectively, application of
any other voltage other than VDD and gnd to pMOS and NMOS source respectively leads
to a voltage drop of $V_{tp}$ for pMOS and (VDD − $V_{tn}$) for nMOS transistors at
the output (drain). Here, $V_{tp}$ and $V_{tn}$ represents the threshold voltage of
pMOS and nMOS transistor. The above said limitation can be explained clearly for the
above GDI NCL Th22 gate by theoretically examining the output voltages for all the
input combinations. Assuming all the pMOS and nMOS transistors have the same properties
(i.e. same widths and lengths for the pMOS and nMOS transistors respectively). The
final output voltage for different input combinations is as explained below: When
A = 0 and B = 0; voltage at node N1 would be $V_{tp}$ and at node N2 the voltage would
be $V_{tp}$. Assuming the previous stage to be zero then the present output would
be greater than $V_{tp}$ leading to a significant voltage drop as shown in Fig. 3 . When A = 0, B = 1; voltage at node N1 would be $V_{tp}$ and at node N2 the voltage
would be V DD. Assuming the previous stage was null then the current results would
be greater than $V_{tp}$. Therefore, significant performance degradation. For the
input combination A = 1 and B = 0; voltage at node N1 would be zero and at node N2
the voltage would be (VDD − $V_{tn}$). Assuming the previous stage to be 0 then the
present output would be $V_{tp}$. Therefore, voltage drop at the output voltage. For
A = 1 and B = 1; voltage at node N1 would be (VDD − $V_{tn}$) and at node N2 the voltage
would be (VDD − $V_{tn}$). Assuming the previous stage to be 0 then the present output
would be (VDD − $V_{tn}$). Therefore, voltage drop at the output voltage. Since, the
NCL follows the hysteresis loop (where the present output serves as the feedback to
the next result), this voltage drops also effect the preceding stages causing performance
degradation. Thus, very essential to address this limitation. To overcome this performance
degradation and obtain a full swing output voltage a regenerative buffer is used that
the output of every GDI technique based NCL THmn gates. When compared to the above
said method implementation of regenerative buffer increases the transistor count but
solves the problem of performance degradation.
Fig. 3. Approximate voltage drop across GDI TH22 gate for A=0, B=0 input combination.
3) Leakage current: The current between drain to source of a transistor operating
in weak inversion region is called sub threshold region. This sub threshold conduction
is due to the diffusion current of the minority charge carriers given as (14):
where $I_{SUB}$ is a function of transistor width (W), transistor length (L), temperature,
drain-source voltage ($V_{DS}$), gate- source voltage (VGS), threshold voltage ($V_{t}$)
and process constants (K and m). Under weak inversion the channel surface potential
is almost constant across the channel and the current flow is determined by diffusion
of minority carriers due to a lateral concentration gradient (14). Gate Leakage Gate leakage current is due to the flow of electrons through the oxide.
Fowler-Nordheim tunneling and direct tunneling are the two tunneling mechanisms responsible
for the gate leakage (14). The gate leakage increases exponentially as the oxide thickness is reduced.
where Vox is the oxide layer potential, $t_{ox}$ is the oxide layer thickness, A and
B are constants, and $E_{ox}$ is the electric field over the oxide layer that is given
by:
The sophisticated structure of the GDI cell provide significant reduction in the gate
leakage current as well as the subthreshold leakage current when compared to the static
CMOS gates (14). In static CMOS gates there is always a sub-threshold leakage path for all the possible
input states as the pull-up and the pull-down networks are always connected to the
supply voltage or ground; in contrast to GDI gates where the connection of the pull-up
and pull-down network depends on the functionality to be implemented.
3) Multi-threshold techniques for reducing the threshold voltage drop: The voltage
drops at their output of the GDI gates causing performance degradation. Regenerative
inverters are used to avoid voltage drop but they increase the circuit area. However,
the usage of the cascaded inverters has increased static power dissipation, due to
the increased VGS voltages of the off transistors. This issue limited the use of GDI
technology in the older technologies (14). However, the nanoscale process is providing an option to fabricate different threshold
transistors on the same die can which solve the above problems. The best solution
is provided by using low threshold transistors in the path where a voltage drop is
expected, coupled with regenerative inverters designed using high threshold transistors.
Due to the increased subthreshold leakage in Static CMOS, integration of low threshold
transistors in non-critical path is usually not practiced (14). Since, in GDI, the leakage currents are small, the coupling of low and high threshold
transistors doesn’t dissipate large leakage current as in Static CMOS. When compared
to Static CMOS the performance of GDI is still degraded due to the uses of these transistors.
However, to achieve the same functionality, the total path length from the input to
the output is small (for most function) in GDI and compensates for individual gate
performance degradation (14).
5) Proposed generalized design approach for GDI NCL gates: By implementing the GDI
technique for the asynchronous NCL designs in the nano-scale process, we can utilize
the multi-threshold techniques for reducing the threshold voltage drop. Along the
multi=threshold techniques, introducing the regenerative buffers/inverters eliminates
the voltage drop by producing the full swing voltage at the output. These two techniques
can be used for designing any NCL gates i.e. any NCL circuity. Designing the GDI AND,
OR, MUX cells using the low threshold transistor and the regenerative buffer with
high threshold transistor will not only reduce the delay but also the power consumption
with an area overhead. As an example, the GDI based NCL TH22 designed using the proposed
method is illustrated in Fig. 4. The GDI based designing of TH22 is carried out as explained. First, GDI based AND
and OR configurability is used to designing AB and A+B and then GDI MUX is used to
select AB or A + B based on Z value. If Z = 0, AB value is selected else A + B value
is selected. The above GDI AND, OR and MUX cells are implemented using low threshold
transistor for low power design. Then, the MUX result is passed through regenerative
buffers designed using high threshold transistor to produce a full swing output. Therefore,
efficiently reducing the transistor count and power consumption. Similarly, different
NCL THmn gates are designed using GDI technique. Similarly, different NCL THmn gates
are designed using GDI technique. The number of transistors required for implementing
27 NCL gates using CMOS (Static) (17) and GDI techniques has been compared and found that GDI NCL gates offer 13.5% reduction
in transistor count on average. Thus, using GDI implementation of NCL circuits we
can reduce the transistor count which leads to decrease in power consumption.
The validation of the proposed model is carried out by realizing a variety of delay-insensitive
NCL designs such as a 4-bit ripple carry adder, unpipelined 4x4 multiplier, two stage
pipelined 4x4 NCL multiplier and unpipelined NCL ALU using GDI technology. The in-depth
detail for each design is as explained below.
Fig. 4. Proposed GDI NCL TH22 gate.
3. Ripple Carry Adder
NCL Ripple carry adder (RCA) designed using GDI technology is presented; GDI RCA model.
In this paper a GDI model of a 4-bit RCA is proposed. The proposed model utilizes
low power GDI technique to realize the NCL gates. The results show that the proposed
model have better performance in terms of transistor count, static and dynamic power
dissipation. For designing a 4-bit NCL ripple carry adder, a 4-bit input complete,
optimized NCL full adders are utilized which are sandwiched between two DI registers.
The optimized NCL full adder is designed using two T H23 and T H34W 2 gates. Fig. 7 depicts the proposed optimized GDI NCL full adder, where TH23 and T H34W 2 gates
are implemented using GDI technology. Fig. 5 depicts the transistor level implementation of T H23 gate using GDI technique, where
a restoration buffer is added at the output to restore the signal to avoid any voltage
drop. For designing a low power circuit, except for the buffer, the rest of the circuit
is designed using low threshold transistors. The reason for realizing buffer using
high threshold transistors is to restore the dropped voltage levels.
Fig. 4. Block diagram of the proposed 2b/cycle NS SAR ADC.
Fig. 5. GDI Implementation of TH23 gate.
Fig. 6. GDI Implementation of TH34W2 gate.
Fig. 7. GDI Model of Full Adder with DI Registers.
4. 4-Bit Multiplier
NCL multipliers are classified into non-pipelined and pipelined multipliers. In this
paper a GDI model of 4-bit non-pipelined and pipelined NCL multiplier is proposed.
In the GDI model all the modules are implemented in GDI technique, to over- come the
limitations of the static CMOS design. The proposed model provides the best performance
in terms of power and area.
Fig. 8. GDI Model of Non-pipelined, 1-stage 4×4 multiplier.
Non-Pipelined Multiplier
Fig. 8 illustrate the proposed GDI model for the existing non-pipelined (6), 1-stage 4-bit multiplier using full-word completion version of the NCL multiplier
design. To reduce the transistor count and dynamic power dissipation, all the modules
of the existing CMOS Non-pipelined multiplier are replaced with GDI modules. Thus,
resulting to a GDI model consisting of GDI technology-based gates. As depicted in
the Fig. 8, the GDI model consists of 8-bit GDI registers, incomplete GDI AND, complete GDI
AND gate, GDI half adders (GDI HA) and GDI full adder (GDI FA). I and C denotes “incomplete
GDI AND” and “complete GDI AND” functions, respectively. The GDI multiplier also include
GENS7 and the completion component, denoted as COMP. The 8-bit GDI registers at the
input and at the output are used to control the ow of DATA and NULL wavefronts as
shown in Fig. 8.
Fig. 9. GDI Model of 2-stage 4×4 multiplier.
2-Stage Pipelined Multiplier
The proposed GDI model for the existing 2-stage 4-bit multiplier (6) using full-word completion is depicted in Fig. 9. It consists of an 8-bit GDI register, an 8-bit CMOS register, a 12-bit GDI register,
incomplete GDI AND (I), complete GDI AND (C), GDI half adders and the GDI full adder
(GDI FA). Here, a 12-bit GDI registers is added between the HYBRID HA and GDI FA in
addition to the proposed HYBRID Non- pipelined, 1-stage 4-bit multiplier using full-word
completion, to achieve 2-stage GDI 4-bit multiplier.
5. Hybrid Non-pipelined ALU
The logic diagram of the proposed non-pipelined dual-rail GDI ALU is shown in Fig. 10. The existing non-pipelined dual-rail ALU (3) is modified to obtain the proposed model. The proposed model gives better performance
in terms of transistor count and power dissipation. It consists of dual-rail GDI registers,
completion components (COMP), GDI Convert to MEAG function, GDI Demultiplexer, GDI
NCL OR, GDI AND, GDI XOR, invert, shift right, shift left functions, a GDI ripple
carry subtractor and adder, two GDI Multiplexers and CMOS Carry Logic. The Convert
to MEAG function converts the three dual rail signals to an 8-rail MEAG signals. This
conversion is carried out by eight TH33 gates present in the Convert to MEAG function.
The invert, shift right, and shift left operations are done by renaming the signals
and hence, have no logic delay.
Fig. 10. Non-pipelined Dual-Rail GDI ALU.
The GDI ripple-carry subtractor and adder consist of four GDI full adders. Based on
the select MEAG result, the GDI Demultiplexer selects the corresponding function.
The GDI Demultiplexer is realized using GDI TH22 gates, which pass the input A, B,
and $C_{in}$/$B_{in}$ inputs, respectively. For the functions which doesn’t require
B input, GDI Demultiplexer is designed using GDI TH34 gates, which also ensures input-completeness
with respect to B. On the other hand, the CMOS Carry Logic generates $C_{out}$ and
provides input completeness to $C_{in}$/$B_{in}$ inputs. The CMOS Multiplexers consists
of TH14 and TH12 gates, which produces single results by OR-ing each rail of the demultiplexer
signals.
IV. SIMULATION RESULTS
This section presents the comparison results of different of NCL circuits implemented
using CMOS and GDI technology. They are three different types of CMOS models: High
Threshold model (High $V_{th}$) where the complete circuit is realized using only
high threshold transistors. In the second Low Threshold model (Low $V_{th}$) the low
threshold transistors are used for realizing the design. Lastly, the standard threshold
transistors are used for designing the Standard Threshold model (std $V_{th}$). The
low threshold transistors offer high speed but high-power consumption, high threshold
transistors have low power and high latency, and standard threshold transistors provide
medium delay and medium power dissipation. The GDI design performance is compared
individually with all three CMOS designs. The performance comparison is based on number
of transistors, static and dynamic power dissipation. The CMOS and GDI designs are
realized in 45 nm technology using Cadence proprietary general process design kit
(gpdk45). A process design kit contains the process technology and needed information
to do device-level design in the Cadence environment. The schematics are implemented
in Cadence Virtuoso tool with VDD = 1V and temperature= 27°. The circuits are simulated
with the Spectre simulator in the Cadence Virtuoso using gpdk45 high and low threshold
MOSFET transistors with W/L ratio of 1. Note that all transistors for all designs,
both CMOS and GDI, are minimum sized.
Table 2. Simulated Results 4-Bit RCA using CMOS and GDI Technology
Design Technique
|
Static Power (nW )
|
Average Power (nW )
|
Dynamic Power ( nW )
|
Transistor Count
|
CMOS model 1
|
0.588
|
14.01
|
13.42
|
1128
|
CMOS model 2
|
9.77
|
32.21
|
22.44
|
1128
|
CMOS model 3
|
1.01
|
17.46
|
16.45
|
1128
|
GDI model
|
1.63
|
13.79
|
12.16
|
960
|
Table 3. Performance Comparison of 4-Bit Unpipelined multiplier
Design Technique
|
Static Power (nW )
|
Average Power (nW )
|
Dynamic Power (nW )
|
Transistor Count
|
CMOS model 1
|
1.58
|
21.06
|
19.48
|
2040
|
CMOS model 2
|
15.8
|
45.83
|
30.03
|
2040
|
CMOS model 3
|
1.66
|
25.93
|
24.27
|
2040
|
GDI model
|
2.9
|
21.916
|
19.01
|
1760
|
Simulations were carried on all the possible input patterns to calculate static and
dynamic power dissipation. Dynamic power dissipation is the power dissipated during
the transient state condition (when the transistors of the circuits are switching
from one logic state to another). For computing the dynamic power, first the average
power for all the available input patterns is measured. Then, the static power is
deducted from the measured average power to obtain the dynamic power.
A. 4-bit Ripple Carry Adder - CMOS vs GDI
Ripple Carry Adder presented in this paper is designed using four different models
i.e. low threshold, high threshold and standard CMOS models and a GDI technique based
RCA model. In the CMOS model1 the whole circuit is designed using high threshold transistors,
similarly the CMOS model 2 designed with low threshold transistors and standard transistor
are used in CMOS model3. Whereas, in the GDI RCA model complete circuit (full adder,
input and output registers) is designed using GDI technology. Table 2 shows the performance comparison of these designs in terms of power and transistor
count. Simulations are carried out using input test vectors, which covers all possible
input combinations for a 4-bit RCA. The values tabulated in the Table 2 corresponds to the average value calculated for all possible input combinations.
The GDI RCA model offers 14% reduction in transistor count when compared to all designs
of CMOS models. In comparison with the CMOS High threshold, low-threshold, and standard
transistor models, the GDI model results in 9.3%, 45.7% and 30.30% reduction in dynamic
power reduction.
B. 4-bit NCL Multiplier CMOS vs GDI
The CMOS and the GDI design comparison can also be extended to multipliers. Two types
of 4-bit NCL multipliers, 4-bit Unpipelined Multiplier and 4-bit pipelined Multiplier
are designed and there simulation results are discussed as below.
1) 4- bit Unpipelined Multiplier: The four models of unpipelined NCL multipliers designed
in this paper constitutes of three different CMOS models and the GDI model. The GDI
design model results are compared with the CMOS models. As seen from Table 3, the GDI design gives the best performance in terms of the # transistor used and
dynamic power dissipation when compared to the CMOS models. The dynamic power is improved
by 2.4%, 36.6% and 21.6 when compared with CMOS model 1, CMOS model 2 and CMOS model
3. In comparison to the CMOS models, the GDI model offers 13.7% reduction in transistor
count. Thus, reducing the dynamic power and area as well.
Table 4. 4-Bit Two Stage Pipelined Multiplier Simulation Results for CMOS and GDI
Technology
Design Technique
|
Static Power (nW )
|
Average Power (nW )
|
Dynamic Power ( nW )
|
Transistor Count
|
CMOS model 1
|
2.17
|
28.34
|
26.17
|
2574
|
CMOS model 2
|
20.9
|
68.40
|
47.5
|
2574
|
CMOS model 3
|
1.23
|
33.915
|
32.685
|
2574
|
GDI model
|
3.72
|
29.56
|
25.84
|
2238
|
Table 5. Performance Comparison of Non-Pipelined Dual-Rail CMOS and GDI ALU
Design Technique
|
Static Power (nW )
|
Average Power (nW )
|
Dynamic Power ( nW )
|
Transistor Count
|
CMOS model 1
|
1.95
|
19.116
|
17.16
|
4084
|
CMOS model 2
|
23.9
|
54.25
|
30.53
|
4084
|
CMOS model 3
|
3.09
|
24.55
|
21.435
|
4084
|
GDI model
|
5.54
|
23.96
|
18.42
|
3520
|
2) Two Stage pipelined Multiplier: Performance analysis of the Nonpipelined ALU, designed
using three different CMOS approaches and GDI are discussed below. To prevent power
dissipation and area consumption, GDI model employing low power GDI technique is proposed.
Table 4 presents the simulation results of the three CMOS models and the GDI model. The Average
power presented are the average of all the input transitions possible for the 4-bit
ALU. As illustrated the GDI non-pipelined ALU design results in a 1.2%, 45.6% and
20.9% decrease in dynamic power. In addition, transistor count is decreased by 13.4%
when compared to the all the CMOS non-pipelined ALU design.
C. Hybrid Non-pipelined ALU
The Performance analysis of the Nonpipelined ALU, designed using two different approaches
CMOS and GDI are discussed below. To prevent threshold voltage penetration inside
the circuit and to utilize the GDI low power technique advantages, a GDI circuit comprising
of both GDI NCL gates is proposed. Table 5 presents the simulation results of both the CMOS and the GDI models. As illustrated
the GDI non-pipelined ALU design results in a 39% and 14% decrease in the dynamic
power dissipation when compared to CMOS model 2 and model 3. However, GDI model dynamic
power increases by 6% in comparison with the CMOS model 1. This variation is because
of the type of threshold transistor used in these models. The CMOS model1 only comprises
of high threshold transistors which dissipates less power. Whereas the GDI model uses
both high and low threshold transistors, this low threshold transistors are the reason
for its increased power. The GDI model results in 13% reduced transistor count in
comparison to all the CMOS models.
IV. CONCLUSIONS
In this paper, a novel GDI NCL model is proposed to address the limitations of the
existing CMOS NCL design. The GDI model contains modules implemented using GDI technique.
The main drawback of the CMOS NCL design is it occupies a large area. To address this
limitation modules of the NCL design are implemented using the GDI technique. The
GDI technique is a low power designed approach where a wide range of complex circuits
can be implemented using only two transistors. Hence, the GDI approach not only reduces
the power dissipation but also reduces the transistor count.
However, when the NCL gates are designed using the GDI technique there is a considerable
voltage drop at their outputs. This problem is addressed by using low threshold transistors
where a voltage drop is expected, and high threshold transistor are used for the regenerative
inverters at the output. The proposed idea is implemented in various NCL circuits
such as the RCA, unpipelined multiplier and pipelined multiplier, unpipelined ALU
When compared to the CMOS design, the GDI models have less transistor count, dynamic
power dissipation.
REFERENCES
Mader R., Friedman E. G., Litman A., Kourtev I. S., May 2002, Large scale clock skew
scheduling techniques for improved reliability of digital synchronous vlsi circuits,
IEEE International Symposium on Cirtuis ans Systems(ISCAS 2002), Vol. 1, pp. I-357
Smith S. C., DeMara R. F., Yuan J. S., Ferguson D., Lamb D., 2004, Optimization of
null convention self-timed circuits, INTEGRATION, the VLSI journal, Vol. 37, No. 3,
pp. 135-165
Bandapati S. K., Smith S. C., 2007, Design and characterization of null convention
arithmetic logic units, Microelectronic engineering, Vol. 84, No. 2, pp. 280-287
Parsan F. A., Smith S. C., Oct 2012, CMOS implementation of static threshold gates
with hysteresis: A new approach, in Proc. IEEE/IFIP 20th Int VLSI and System-on-Chip
(VLSI-SoC) Conf, pp. 41-45
Bonam R., Chaudhary S., Yellambalase Y., Choi M., Aug 2007, Clock-free nanowire crossbar
architecture based on null convention logic (ncl), in Proc. 7th IEEE Conf. Nanotechnology
(IEEE NANO), pp. 85-89
Smith S. C., 2001, Gate and throughput optimizations for null convention self- timed
digital circuits, Ph.D. dissertation, University of Central Florida Orlando, Florida
Choi M., Kang B.-H., Kim Y.-B., Kim K. K., Nov 2014, Asynchronous circuit design using
new high speed ncl gates, in Proc. Int. SoC Design Conf. (ISOCC), pp. 13-14
Parsan F. A., Smith S. C., Aug 2012, CMOS implementation comparison of ncl gates,
in Proc. IEEE 55th Int. Midwest Symp. Circuits and Systems (MWSCAS), pp. 394-397
Metku P., Kim K. K., Kim Y.-B., Choi M., Oct 2018, Low-power null con- vention logic
multiplier design based on gate diffusion input technique, in 2018 International SoC
Design Conference (ISOCC), pp. 233-234
Morgenshtein A., Yuzhaninov V., Kovshilovsky A., Fish A., 2014, Full- swing gate diffusion
input logiccase-study of low-power cla adder design, INTEGRATION, the VLSI journal,
Vol. 47, No. 1, pp. 62-70
Fant K. M., Brandt S. A., Oct. 27 199, Null convention logic system, US Patent 5,828,228
Smith S. C., DeMara R. F., Yuan J. S., Hagedorn M., Ferguson D., 2002, Null convention
multiply and accumulate unit with conditional round- ing, scaling, and saturation,
Journal of Systems Architecture, Vol. 47, No. 12, pp. 977-998
Sobelman G. E., Fant K., May 1998, Cmos circuit design of threshold gates with hysteresis,
IEEE International Symposium on Circuits and Systems (ISCAS1998), Vol. 2, pp. 61-64
Morgenshtein A., Shwartz I., Fish A., Nov 2010, Gate diffusion input (gdi) logic in
standard CMOS nanoscale process, in Proc. IEEE 26-th Convention of Electrical and
Electronics Engineers in Israel, pp. 000 776-000 780
Morgenshtein A., Fish A., Wagner I.A., May 2002, Gate-diffusion input (gdi) - a technique
for low power design of digital circuits: analysis and characterization, IEEE International
Symposium on Circuits and Systems (ISCAS2002), Vol. 1, pp. I–477-I–480
Parsan F. A., Smith S. C., Aug 2012, Cmos implementation comparison of ncl gates,
in Circuits and Systems (MWSCAS), 2012 IEEE 55th International Midwest Symposium on
Smith S. C., Di J., 2009, Designing asynchronous circuits using null convention logic
(ncl), Synthesis Lectures on Digital Circuits and Systems, Vol. 4, No. 1, pp. 1-96
Author
Prashanthi Metku is from Hyderabad, India.
She received her B.Tech degree in Electronic and Communication Engineering from Jawaharlal
Nehru Technological University, Hyderabad, India, in 2011 and M.Tech degree in Electronic
Engineering from Pondicherry University, India, in 2014.
She is currently pursuing her Ph.D. degree in the Computer Engineering from Missouri
University of Science and Technology, United States.
Her interests include CMOS circuit design and Error Correction Codes.
Kyung Ki Kim received his B.S. and M.S. degrees in Electronic Engi-neering from Yeungnam
University, South Korea, in 1995 and 1997, respectively.
He was a candidate for Ph.D. in Computer Science from Sogang University, South Korea
from 1997 to 1999, and received his Ph.D. Degree in Computer Engineering from Northeastern
University, Boston, USA in 2008.
He was a member of technical staff with Sun Microsystems, Santa Clara, CA in 2008
and a senior researcher with Illinois Institute of Technology, Chicago, USA in 2009.
Since March 2010, he has been with the school of Electronic and Electrical Engineering,
Daegu University, Korea, where he is currently an Associate Professor.
His current research focuses on neuromorphic architecture, high speed low power VLSI
design, asynchronous design, electronic CAD and nano-electronics.
Yong-Bin Kim received the B.S. degree in electrical engineering from Sogang University,
Seoul, Korea, the M.S. degree in electrical engineering from New Jersey Institute
of Technology, Newark, NJ, USA, and the Ph.D. degree in electrical and computer engineering
from Colorado State University, Fort Collins, CO, USA.
He was a member of the technical staff with Electronics and Telecommunications Research
Institute(ETRI), Daejon, Korea from 1982 to 1987.
He was a Senior Design Engineer with Intel Corp., Hillsboro, OR, USA, from 1990 to
1993, involved in Intel Pentium Pro CPU chip design.
He was a Member of Technical Staff with Hewlett Packard Co., Fort Collins, CO, USA
from 1993 to 1996, involved in HP PA-8000 RISC microprocessor chip design.
He was as a Staff Engineer with Sun Microsystems, Palo Alto, CA, USA from 1996 to
1998, involved in 1.5 GHz Ultra Sparc5 CPU chip design.
He was an Assistant Professor with the Department of Electrical and Computer Engineering
of the University of Utah, Salt Lake City, UT, USA from 1998 to 2000.
He is currently a Professor with the Department of Electrical and Computer Engineering
at Northeastern University, Boston, MA, USA.
His research focuses on low-power analog and digital circuit design as well as high-speed
low-poper VLSI circuit design and methodology.
Minsu Choi received his B.S., M.S. and Ph.D. degrees in Computer Science from Oklahoma
State University in 1995, 1998 and 2002, respectively.
He is currently an associate professor of Electrical and Computer Engineering at Missouri
University of Science & Technology (Missouri S&T).
His research mainly focuses on Computer Architecture & VLSI, Crypto-hardware design,
Nanoelectronics, Embedded Systems, Fault Tolerance, Testing, Quality Assurance, Reliability
Modeling and Analysis, Configurable Computing, Parallel & Distributed Systems and
Dependable Instrumentation & Measurement.
He has won two outstanding teaching awards at MST in 2008 and 2009.
He is a senior member of IEEE and a member of Golden Key National Honor Society and
Sigma Xi.