AnByung-Kwon1
ZhangXueyong1
DoAnh Tuan2
KimTony Tae-Hyoung1
-
(School of Electrical Electronic Engineering, Nanyang Technological University, Singapore
639798)
-
(IC-Design Department, Institute of Microeletronics(IME), A*STAR, Singapore 138634)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Resistive random-access memory (RRAM), dynamic reference current sense amplifier (DR-CSA), high resistance state (HRS), low resistance state (LRS), R-ratio (R$_{HRS}$/R$_{LRS}$), sensing margin
I. INTRODUCTION
The increasing volume of data generated and required by modern applications, including
data centers, IoT devices, mobile electronics, and low-power image processors, has
significantly heightened the demand for memory solutions that are high-density, cost-effective,
and energy-efficient [1, 7-10]. In the past, traditional hard disks were widespread
but proved inadequate in meeting these demanding requirements, ultimately leading
to the widespread adoption of flash memory. Nonetheless, flash memory's limited scaling
and endurance, stemming from the inherent constraints of tunneling-based write mechanism,
present a substantial challenge [2,3]. Considering these challenges, Resistive Random Access Memory (RRAM) has arisen as
a compelling alternative and a front-runner in non-volatile memory. RRAM, renowned
for its non-volatility, substantial resistance ratios (R$_{HRS}$/R$_{LRS}$), impressive
endurance, minimal read/write latency, compatibility with CMOS processes, and low
supply voltage requirements, is particularly well-equipped to meet the progressively
demanding memory needs of modern technologically advanced applications [1-6]. An RRAM device comprises two terminals and features three layers of metal-insulator-metal
(MIM) configuration. The insulator layer is responsible for the reversible electric-field-induced
resistance switching caused by conductive filaments, as depicted in Fig 1(a) [2,11,12]. The formation of filaments results in a low resistance state (LRS), while the rupturing
of the filaments results in a high resistance state (HRS). Typically, an RRAM cell
consists of one transistor and one RRAM device (1T1R) [2]. The transistor functions as a selector to control access to the bit cell.
Despite RRAM's numerous advantages, significant challenges still need to be addressed,
such as resistance variations during fabrication and resistance drift during operation.
As illustrated in Fig. 1(b), resistance variations and drift can degrade the R-ratios (R$_{HRS}$/R$_{LRS}$).
Additionally, the offset current affects the sensing performance. The offset is impacted
by the limited sensing margin resulting from resistance degradation, which can lead
to read errors due to reduced sensing margins [4-6]. Therefore, several sensing schemes have been developed for RRAM to enhance the sensing
margin [14-18]. For instance, current sense amplifiers such as the two-step scheme (DS-CSA), three-step
scheme (TS-CSA), and covalent-bonded scheme (CB-CSA) configurations have been reported
to enhance the sensing margin [14,16,17]. Nevertheless, the multi-step sense amplifiers (DS-CSA and TS-CSA) require significant
delays due to multiple operational phases [14,16], and the covalent-bonded (SB-CSA) scheme demands an additional reference array to
generate two distinct reference currents [17]. This work introduces a current sense amplifier that employs a dynamic reference
scheme, enhancing both variation tolerance and sensing speed. The proposed sense amplifier
improved the sensing margin (|V$_{CELL}$-V$_{REF}$|) by a factor of 4 x and decreased
the sensing time and read power by 53% and 30%, respectively, compared to the conventional
CSA scheme [13]. The rest of this paper is organized as follows. Section 2 introduces various RRAM
sense amplifiers and their limitations. Section 3 explains the proposed DR-CSA. Simulation
results and comparison findings are presented in Section 4, followed by conclusions
in Section 5.
Fig. 1. (a) 1T1R structure and equivalent circuit; (b) impact of RRAM resistance variations on the R-ratio.
II. REVIEW OF STATE-OF-THE-ART SENSE AMPLIFIERS
1. Conventional Current-mirror-based Sense Amplifier (CSA).
Fig. 2 illustrates the conventional current-mirror-based current sense amplifier (CSA) [13]. This scheme enables the comparison of cell current and reference current using a
current mirror load. A reference current (I$_{REF}$) is generated by setting the median
current between HRS and LRS (i.e., I$_{REF}$ = (I$_{HRS}$ + I$_{LRS}$) / 2). When
a read operation starts, both V$_{CELL}$ and V$_{REF}$ are generated simultaneously.
These processes occur at both the cell node and the reference node. V$_{CELL}$ is
created based on the current difference between I$_{CELL}$ and I$_{REF}$, while V$_{REF}$
is generated using the I$_{REF}$. The sensing margin, which represents the difference
between V$_{CELL}$ and V$_{REF}$ for the comparator, can be written as follows.
Here, R$_{O}$ is the output resistance of the PMOS load. However, resistance fluctuations
within R$_{CELL}$ and R$_{REF}$ reduce both the sensing margin and the sensing window.
Furthermore, the sensing margin is also affected by the offset voltage in the comparator
input devices, the current mirror, and the clamping devices [13,14]. Therefore, the conventional CSA is not robust enough for RRAM with a smaller R-ratio
caused by RRAM resistance variations. Three alternative CSAs, namely the two-step
(DS-CSA), three-step (TS-CSA), and covalent-boned (CB-CSA) current sense amplifiers,
were reported to address the above challenges [14,16,17].
Fig. 2. Schematics of conventional current-mirror-based CSA [13].
2. Two-step CSA (DS-CSA)
Fig. 3 illustrates the two-step CSA (DS-CSA) for sensing margin improvement over the conventional
CSA [14]. This scheme alters the current paths using two-step switches and independently develops
the voltage at V$_{CELL}$ and V$_{REF}$ of the comparator input nodes. In the first
step (SS1 = ‘high’), highlighted in red, EQ is activated for a short period. Subsequently,
currents flow in their regular directions, as shown in Fig. 3 (Left). This step results in developing the sensing voltage (i.e., 1st (|I$_{CELL}$
– I$_{REF}$|)) at the first input node of the comparator for V$_{CELL}$. In the second
step (SS2 = ‘high’), highlighted in blue, the current path alters like in Fig. 3 (Right), generating V$_{REF}$ (i.e., 2st (|I$_{REF}$ – $_{\mathrm{ICELL}}$|)). As
a result, this DS-CSA compares V$_{CELL}$ and V$_{REF}$, amplifying the sensing margin
using two steps.
Fig. 3. Schematics of DS-CSA [14].
3. Three-step CSA (TS-CSA)
Fig. 4 illustrates the three-step CSA (TS-CSA) [16]. Similar to the DS-CSA, the TS-CSA also employs multiple steps involving switches
and capacitor coupling to enhance the sensing margin. At the beginning of a sensing
operation, distinct overdrive voltages (V$_{OV}$) are produced for the cell current
(i.e., I$_{CELL}$) and the reference current (i.e., I$_{REF}$) using the current paths
in load 1. These overdrive voltages (V$_{OV\_ ICELL}$, V$_{OV\_ IREF}$) used for the
current generation are subsequently duplicated through capacitive coupling from load
1 to the current paths in load 2. As the transistor sizes in load 2 are twice those
in load 1, the increased current (i.e., 2I$_{CELL}$, 2I$_{REF}$) from this load enables
to result in a larger current difference compared to the normal currents (i.e., I$_{CELL}$,
I$_{REF}$) at the comparator. The following next provides a detailed explanation.
TS-CSA includes three major behaviors: 1) threshold-voltage sampling, 2) overdrive-voltage
sampling and coupling, and 3) current-difference amplification.
In standby mode (DSD, CHD, SW3, 4 = on), as load 1 and load 2 consist of the diode-connected
by switches, the gates and drains of load 1 and load 2 are set to ``0'' (i.e., D1,
D2, D3, D4 nodes = 0V).
In the first step (DSD, CHD = off, SW3, 4=on), the threshold voltages of the diode-connected
PMOS transistors are generated in loads 1 and 2, respectively. As a result of this
step, V$_{DD}$ - V$_{TH}$ is stored and applied to their gates and drains (i.e., D1,
D2, D3, D4 nodes = V$_{DD}$ - V$_{TH}$).
In the second step (CHD, SW1,3 = on), the WL and CLAMP are activated, causing both
I$_{CELL}$ and I$_{REF}$ to flow through the PMOS transistors of load 1 in red color.
Due to the current-sampling behavior [15,16], the sampled voltages are produced based on the cell current (i.e., I$_{CELL}$) and
the reference current (i.e., I$_{REF}$) at the gates and drains of load 1 (i.e., D1
node = V$_{DD}$ - V$_{TH}$ - V$_{OV\_ ICELL}$ and D2 node = V$_{DD}$ - V$_{TH}$ -
V$_{OV\_ IREF}$).
Simultaneously, by coupling overdrive voltages from both D1 node and D2 node to G1
node and G2 node (i.e., V$_{OV\_ ICELL}$, V$_{OV\_ IREF}$) to load 2, this load generates
doubled currents (i.e., 2I$_{CELL}$, 2I$_{REF}$) through double-sized width devices
in blue color.
In the third step (SW2 = on), an increased sensing margin for the comparator node
can be achieved by amplifying the current difference (i.e., D3 node = |2I$_{REF}$
- I$_{CELL}$|, D4 node = |2I$_{CELL}$ - I$_{REF}$|) between the original currents
(i.e., I$_{CELL}$, I$_{REF}$) of load 1 and the doubled currents (i.e., 2I$_{CELL}$,
2I$_{REF}$) of load 2. As a result, this operation amplifies the sensing margin at
the comparator. Nevertheless, the two CSA types mentioned above improve the sensing
margin compared to the conventional CSA using switches and additional steps [14,16]. However, as previously mentioned, these sense amplifiers require multiple steps
to generate voltages for comparison.
Fig. 4. Schematics of TS-CSA [16].
4. Covalent-bonded CSA (CB-CSA)
The covalent-bonded CSA (CB-CSA) is another sensing scheme for improving the sensing
margin, as depicted in Fig. 5 [17]. CB-CSA addresses the issues of DS-CAS and TS-CSA through its structural optimization
rather than using multiple steps. In CB-CSA, two RRAM cells, an HRS cell, and an LRS
cell, are used as reference cells that will be compared with the accessed RRAM cell
through two latches, which is called a covalent structure. When a read operation starts,
all current components (I$_{REF\_ HRS}$, I$_{CELL}$, I$_{REF\_ LRS}$) flow through
the loads. Each latch compares the current of one reference cell (I$_{REF}$ = I$_{HRS}$
or I$_{LRS}$) with a part of the accessed cell current (I$_{CELL}$). While all the
currents (I$_{REF\_ HRS}$, I$_{REF\_ LRS}$, and I$_{CELL}$) flow at the same time,
the latch with a larger input current difference becomes dominant in comparison. The
operation of the other latch is affected by the comparison result of the dominant
latch. For example, the right latch is dominant when reading an HRS state. Conversely,
the left latch becomes dominant when reading an LRS state. This CSA uses the unit
cell reference current for sensing instead of a standard reference current (i.e.,
I$_{REF}$ = (I$_{HRS}$ + I$_{LRS}$) / 2), resulting in an improved sensing margin
compared to the conventional CSA. The current sense amplifier can be shared with the
current mirror in the reference column in an array [18]. However, CB-CSA requires more area due to the additional reference array. The covalent
structure also creates difficulties when sharing a reference current with multiple
current mirrors [17]. In CB-CSA, each sub-array requires two columns with LRS and HRS as references. When
reading N bits, the total number of columns for generating reference voltage for each
sense amplifier becomes 2 ${\times}$ N.
Fig. 5. Schematics of CB-CSA [17].
III. PROPOSED CURRENT SENSE AMPLIFIER WITH DYNAMIC REFERENCE(DR-CSA)
This work proposes a current sense amplifier with dynamic reference (DR-CSA) to enhance
the sensing margin by adjusting the reference current based on the cell state. Fig. 6(a) depicts the proposed sense amplifier. It comprises a modified conventional CSA and
the proposed dynamic reference controller (DRC). The DRC is connected between the
comparator input nodes and is activated by the enable signal (EN) during RRAM sensing.
V$_{REF}$ and V$_{CELL}$ are precharged to V$_{DD}$/2 by the PMOS load and the equalizing
PMOS transistors. Fig. 6(b) illustrates the schematic of the DRC, which comprises a capacitor, basic logic gates,
and two PMOS switches. The DRC adjusts the reference current (I$_{REF}$) of the current
mirror load after the sensing capacitor (C1) detects the early-stage change at V$_{CELL}$.
Further details of the DR-CSA operation are provided below. In the standby mode (EN
= ‘0’) in Fig. 6(b-top), V$_{CELL}$ and V$_{REF}$ are set to V$_{DD}$/2 by enabling the PMOS-based
equalizer through EN. Additionally, P1 shorts the inverter input and output, increasing
its gain. Since P3 is turned off, the DRC becomes decoupled from V$_{REF}$ and V$_{CELL}$.
In the active mode (EN = '1'), as illustrated in Fig. 6(b-bottom), the DRC block is activated by EN. ``Simultaneously, P1 is turned off,
and P3 is turned on to connect the DRC output to V$_{REF}$. When accessing an RRAM
cell, C1 detects the change at V$_{CELL}$, and an additional current path is formed
by N1 or P2 depending on the amplified signal at ‘SIG’. Fig. 7 shows an example of sensing LRS. As the sensing operation starts, the equalized V$_{CELL}$
and V$_{REF}$ develop an initial voltage difference. Since the cell current is larger
than the reference current during the reading LRS (i.e., I$_{LRS}$ > I$_{REF}$), V$_{cell}$
is pulled below V$_{DD}$/2. Simultaneously, this slight voltage drop at V$_{cell}$
is detected by C1 in the DRC block and generates a small voltage drop at the inverter
input by capacitive coupling. The small change at the input of the inverter is amplified
by the inverter and the NAND2 gate, resulting in 'SIG' being set to '0' (i.e., SIG
= 0). As a result, this activation turns on P2, providing an additional pull-up current
(I$_{Charge}$). I$_{REF}$ can be rewritten as 'I$_{Load}$ + I$_{Charge}$,' which decreases
I$_{load}$. The decreased I$_{Load}$ will be copied to I$_{Mirror}$, lowering V$_{CELL}$
further. As a result, the voltage difference between V$_{REF}$ and V$_{CELL}$ will
be enhanced more through the feedback operation in DRC.
Fig. 8 shows the sensing operation when accessing an RRAM in HRS. In contrast to LRS, when
reading HRS (i.e., I$_{CELL}$ < I$_{REF}$), V$_{cell}$ is formed slightly above VDD/2.
This small voltage rise at V$_{cell}$ is coupled into DRC through C1, leading to an
additional pull-down current path through P3 and N1. As a result, I$_{Load}$ increases
by the amount of I$_{Discharge}$. Then, the increased I$_{load}$ is copied by the
current mirror and raises V$_{CELL}$ to a higher level, improving the sensing margin.
Fig. 6. (a) Proposed current sense amplifier; (b) schematic of dynamic reference controller (DRC).
Fig. 7. Operation of DRC for sensing margin improvement in sensing LRS.
Fig. 8. Operation of DRC for sensing margin improvement in sensing HRS.
IV. SIMULATION RESULT AND COMPARISON
In this work, the 64~kb RRAM assisted with the proposed sense amplifier is designed
in 40-nm CMOS technology. The supply voltage is 1.1 V, and the employed RRAM device
is modeled in Verilog-A and is based on the HfO$_{\mathrm{X}}$ RRAM stack in [19,20]. Fig. 9 illustrates the simplified architecture of the 64kb RRAM used for validating the
proposed DR-CSA. The reference current generator is shared with other sense amplifiers
to minimize the area overhead, as shown on the right side of Fig. 9. The designed array consists of 256 ${\times}$ 256 RRAM cells with 32 sub-arrays,
using 32 sense amplifiers. Fig 10 shows the DC I-V curve of the RRAM model and the
Monte-Carlo simulation result with 1000 samples. The average resistance of HRS (Blue)
and LRS (Red) are 950~k${\Omega}$ and 9 k${\Omega}$, respectively. I$_{REF}$, I$_{HRS}$,
and I$_{LRS}$ are 1.5~${\mathrm{\mu}}$A, 300 nA, and 2.5 ${\mathrm{\mu}}$A, respectively.
Since the coupling capacitor in DRC employs is implemented by a MOSFET, the MOSFET
size is carefully designed after considering the the important factor to consider
is that the MOS capacitor minimizes area size and the variations caused by the mismatch.
In our design, the area of the MOS capacitorthe MOSFET is sized with selected as 0.32~${\mathrm{\mu}}$m$^{2}$.
as depicted in Fig. 11 presents the MOS capacitor for different sizes and shows the impact of process variations
on the capacitance. The simulated mean and the standard deviation of the capacitance
are 5.6~fF and 104~aF, respectively. The normalized variance of the capacitor (${\Delta}$C/C)
is 1.83%. Since the proposed DR-CSA added a DRC block utilizing a capacitor, there
is an area overhead of approximately 31% compared to the conventional CSA. However,
DR-CSA can achieve better sensing performance. Fig. 12 compares the sensing operation result of the proposed CSA and conventional CSA. Enabling
the DRC increases the voltage difference between V$_{CELL}$ and V$_{REF}$, enhancing
the sensing margin and speed. As shown in Fig. 12(a), when reading LRS, I$_{load}$ decreases, which increases V$_{REF}$ and lowers V$_{CELL}$,
improving the sensing margin. Conversely, when reading HRS in Fig. 12(b), I$_{load}$ increases, which lowers V$_{REF}$ and raises V$_{CELL}$. However, in
the case of HRS, the diode-connected load operates in saturation mode, constraining
the V$_{REF}$ node from dropping V$_{OV\_ IREF}$ [14,16]. Additionally, the limited V$_{OV\_ IREF}$ can impact the rising swing of V$_{CELL}$,
leading to margin imbalances. As a result, the sensing margin is increased by DRC
in Fig. 13. Also, it is worth highlighting that sensing LRS is more challenging than HRS in
conventional CSA due to the smaller margin and larger delay. Therefore, the proposed
sensing technique significantly enhances overall sensing performance by addressing
the LRS sensing issue, especially for RRAM devices with larger variations. Furthermore,
the robustness of the proposed sensing scheme is investigated after considering RRAM
variations.
Fig. 9. Architecture of 64kb RRAM.
Fig. 10. Simulated: (a) I-V characteristics; (b) distribution result of the RRAM model in LRS and HRS.
Fig. 11. (a) MOS capacitance for different transistor sizes; (b) distribution and when the MOS size is 0.32 µm2.
Fig. 12. Simulation results comparing the sensing margins of the proposed CSA with the conventional CSA.
Fig. 13. Comparison of the proposed CSA with the conventional CSA: sensing margin LRS and HRS.
Fig. 14 shows the simulated current and voltage of the conventional CSA over the R-ratio
from 20 to 100. The R-ratio degradation is simulated by adjusting the RRAM model values.
Fig. 14(a) shows that the current margin decreases from 1~${\mathrm{\mu}}$A to 250~nA for LRS
and from 1.2~${\mathrm{\mu}}$A to 660~nA for HRS as the R-ratio degrades from 20 to
100. The corresponding voltage margin is shown in Fig. 14(b). It can be seen that the current and voltage margins for sensing LRS are more vulnerable
to R-ratio degradation compared with those for sensing HRS. Remarkably, a 99% voltage
margin degradation was observed in the sensing LRS, as shown in Fig. 14(b). The degradation in the sensing margin also impacts sensing speed. As the R-ratio
decreases, the delay in sensing LRS increases significantly, in contrast to the delay
in sensing HRS. Fig. 15 presents the simulated margins of the proposed DR-CSA and conventional CSA at various
R-ratios. DR-CSA enhances the margin of both LRS and HRS.
As shown in Fig. 15(a), DR-CSA has no degradation in the margin (${\approx}$ V$_{DD}$) for LRS with a high
swing in V$_{REF}$ (${\approx}$V$_{SS}$) and V$_{CELL}$(${\approx}$V$_{DD}$) since
the reduced I$_{load}$ remains consistent despite the decrease in the ratio. As a
result, it improves the margin by 4${\times}$~16${\times}$. When reading HRS, DR-CSA
demonstrates an increased margin of 20 ~ 25% compared to the conventional CSA in Fig. 15(b).
Fig. 14. Simulated results of the conventional CSA: (a) current; (b) voltage.
Fig. 15. Comparison of the proposed CSA with the conventional CSA at different R-ratios: sensing margin: (a) LRS; (b) HRS.
Fig. 16. Comparison of the proposed CSA with the conventional CSA at different R-ratios: (a) comparator delay; (b) sensing time.
Fig. 16 shows the simulated results of the comparator delay and the overall sensing time.
The comparator delay is significantly reduced due to the increased sensing margin.
After considering the R-ratio, the simulated comparator delay is around 120~ps, representing
a delay reduction of over 90% compared to the conventional CSA in Fig. 16(a). The proposed DR-CSA maintains consistent sensing speed across various R-ratio values.
Fig. 16(b) summarizes the comparison of the sensing time, demonstrating that the proposed CSA
reduces sensing time by ~90% to ~53% as the R-ratio degrades from 20 to 100. However,
as the R-ratio decreases, the conventional CSA experiences an increase in delay time
when assessing distinctions with narrow sensing margins. The sensing time delay significantly
increases as the R-ratio decreases from 100 to 20 due to a narrow sensing margin of
tens of millivolts, as shown in Fig. 14(b). The proposed CSA maintains a consistent average sensing time ranging from 0.84 ns
to 1.07~ns. In contrast, in conventional simulation, sensing time increases from 1.95~ns
to 10~ns. Furthermore, the proposed CSA enhances energy efficiency associated with
the fast sensing time. The energy consumption of the array is illustrated under 1.1
V in Fig. 17(a). The proposed DRC scheme reduces the total array energy and SA energy by up to 32%
and 63%, respectively. Fig. 17(b) presents the energy breakdown of the RRAM. This result indicates that not only is
the total energy reduced, but also the energy portion of the sense amplifier can be
decreased from 8.4% to 4.3%. Therefore, an improvement in the energy breakdown is
observed. Fig. 18 presents the Monte-Carlo simulation results with 1000 samples to assess the robustness
of the proposed CSA scheme. Fig. 18(a) and (b) depict the threshold voltage statistics (V$_{TH}$) included in the simulation
results. The mean value (${\mu}$) of V$_{TH}$ is 0.7 V for NMOS and -0.67 V for PMOS.
The standard deviation (${\sigma}$) is approximately 10 ~ 11 mV. In Fig. 18(c) and (d), the sensing margins for LRS and HRS are presented. In LRS, the sensing margin
exhibits a mean value(${\mu}$) of 1.05 V and a standard deviation (${\sigma}$) of
5~mV. For HRS, the sensing margin has ${\mu}$ = 0.55 V and ${\sigma}$ = 20~mV when
employing 3${\sigma}$ variations. Fig. 18(e) and (f) display the comparator delay and the sensing time. The access time is presented
by ${\mu}$ = 0.86~ns and ${\sigma}$ = 7~ps. As shown in Fig. 19 and 20, the simulation result for the proposed CSA using fast(F) and the slow(S)
corner models of MOSFET are conducted at two different of temperatures: 27 $^{\circ}$C
and 100 $^{\circ}$C, to verify process variation. Fig. 19 presents a simulation waveform with the corner at 27 $^{\circ}$C. Enhancing the comparator
margin across all corner models shows improved performance, although it may cause
a slight delay at the SS corner in the worst-case scenario due to the requirement
of generating cell current for pull-down at DRC. However, as shown in Fig. 20, DRC has led to an overall enhancement in margin performance compared to the conventional
CSA margin on TT corner, even under temperature conditions of 27$^{\circ}$C and 100$^{\circ}$C
in the corners. Consequently, these results demonstrate the overall robustness of
performance across variations in distribution by implementing DRC.
Fig. 17. Comparison of the proposed CSA with the conventional CSA: (a) energy; (a) energy breakdown at VDD = 1.1 V.
Fig. 18. Monte Carlo simulation results: (a) Vth of NMOS; (b) Vth of PMOS; (c) margins in LRS; (d) margins in HRS; (e) comparator delay; (f) sensing time.
Fig. 19. Simulation waveforms of the proposed DR-CSA at process corners.
Fig. 20. Sensing margins of the proposed DR-CSA at process corners.
Fig. 21 and 22 compare various CSAs in a 40-nm process by standardizing the current margin
for comparison at VDD = 1.1 V and 0.9 V. In Fig. 21(a), the DR-CSA achieves a sensing time of 0.84 ns, surpassing the TM-CSA and DS-CSA
by up to 0.8~ns and 0.96~ns, respectively. As explained in Part 2 above, the CB-CSA
performs direct comparisons without requiring an additional latch comparator, resulting
in a slightly faster sensing time than the DR-CSA (-0.16~ns). At a lower VDD (0.9V),
the DR-CSA exhibits faster sensing speeds than the TM-CSA (+5~ns) and DS-CSA (+8.1~ns),
as shown in Fig. 21(b). Fig. 22 presents the sub-array reading energy results for comparing various CSAs. As shown
in Fig. 22(a), DR-CSA exhibits the lowest read energy consumption (235.16 fJ) among the compared
CSAs, achieving reductions of 17%, 9.8%, and 3.8% compared to TM-CSA (282.85 fJ),
DS-CSA (260.722 fJ), and CB-CSA (244.40 fJ), respectively. Moreover, at low VDD, DR-CSA
can reduce sensing energy by more than 10%, resulting in read energy reductions of
21%, 13%, and 2.6% compared to TS-CSA, DS-CSA, and CB-CSA, as depicted in Fig. 22(b). Fig. 23 presents the normalized variance of sensing time from the distributions with the
proposed CSA and various CSAs in (a)-(e). To assess robustness, this value is obtained
by dividing the standard deviation by the mean value (i.e., ${\sigma}$/${\mu}$) based
on a Monte-Carlo simulation with 1000 samples. As shown in Fig. 24, the proposed CSA exhibits a variance of less than 1% in comparison. This result
demonstrates that the proposed CSA can enhance the robustness of the distribution.
Fig. 21. Sensing time at: (a) VDD = 1.1 V; (b) VDD = 0.9 V.
Fig. 22. Read energy per bit at (a) VDD = 1.1 V; (b) VDD = 0.9 V.
Fig. 23. Distribution results of Sensing time: (a) conventional CSA; (b) TS-CSA; (c) CB-CSA; (d) DS-CSA; (e) DR-CSA.
Fig. 24. Comparison of normalized variance for Sensing time.
Table 1 summarizes a comprehensive performance comparison between the proposed CSA and other
CSAs. It includes simulation results evaluating competitive merits and addressing
key challenges. Table 1 is organized into two sections. The first section presents performance data from
literature papers, while the second section presents simulation results compared under
standard conditions utilizing a 2k sub-array and maintaining the same current margin
in a 40-nm CMOS technology. In this table, the voltage margin (V$_{MARGIN}$) exhibits
slight differences in the HRS and LRS of CSAs due to different current loads. Although
the proposed DR-CSA shows the imbalance of sensing margin where LRS is better than
HRS, it improves sensing time by 2.3${\times}$, 2.1${\times}$, and 1.9${\times}$ compared
to the conventional CSA [13], DS-CSA [14], and TS-CSA [16]. Furthermore, the proposed DR-CSA enhances energy efficiency by 17%, 9.8%, and 3.8%
compared to DS-CSA, TS-CSA, and CB-CSA [17]. Table 2 provides the specifications of the proposed DR-CSA and other state-of-the-art SAs.
Compared to other sensing methodologies, the proposed scheme uses simple operation
steps to enhance sensing margins and robustness.
Table 1. Performance with the Comparison
|
JSSC'20[16]
|
TCSAII'15[14]
|
ISSCC'15[17]
|
This Work
|
Process
|
55 nm
|
45 nm
|
Sub-20 nm
|
40 nm
|
Supply voltage
|
1 V
|
1 V
|
1.5 V
|
1.1 V
|
Iread
|
−
|
15 µA
|
2 µA
|
1 µA
|
Sensing time
|
3.16 ns (Test Measure)
|
3.4 ns (Test Simulation)
|
9.1 ns (Test Measure)
|
0.84 ns (Test Simulation)
|
Array (Sub-array)
|
1 Mb 128k(256X512)
|
32 Mb (Unknown)
|
8 Mb 0.5M(512x1028)
|
64 Kb 2k(256X8)
|
|
*: Simulation performance with the same array, Iread, and 40 nm CMOS
|
VMARGIN*
|
500 mV(LRS) 483 mV(HRS)
|
600 mV(LRS) 700 mV(HRS)
|
590 mV(LRS) 550 mV(HRS)
|
1050 mV (LRS) 540 mV (HRS)
|
Sensing time*
|
1.64n
|
1.8n
|
0.64n
|
0.84n
|
Read energy bit*
|
44.922 fJ(SA) 282.85 fJ (Sub-array)
|
22.296 fJ(SA) 260.722 fJ (Sub-array)
|
10.77 fJ(SA) 244.40 fJ (Sub-array)
|
10.17 fJ(SA) 235.16 fJ (Sub-array)
|
Merit
|
Reference share / Margin balance
|
Reference share / Margin balance
|
Sensing speed /
Margin balance
|
Reference share /
Sensing speed
|
Drawback
|
Sensing speed
|
Sensing speed
|
Reference share
|
Margin imbalance
|
*Sensing time: measure from starting WL on to generate a digital output.
*VMARGIN: (VCELL-VREF).
*SA: only sense amplifier power, Sub-array*: The power with sub array.
Set current: LRS 2.5 uA, REF 1.5 uA, 300 nA
Table 2. Comparison with State-of-the-Art Works
|
ISSCC'21[21]
|
ISSCC'22[22]
|
VLSI'22[23]
|
ISSCC'23[24]
|
This Work
|
Sensing scheme
|
Multi-Cell reference SA
|
Charge recycling-VSA
|
Boosted Cross couple-CSA
|
Differential charge accumulation-VSA
|
Dynamic reference-CSA
|
Concept
|
Avoiding reference collapse
|
Recycle charging
|
Boosting internal node
|
Using two reference cells
|
Adjusting reference current
|
Process
|
14 nm
|
18 nm
|
22 nm
|
22 nm
|
40 nm
|
Supply Voltage
|
0.85
|
0.8 V
|
0.8 V
|
0.8 V
|
1.1 V
|
Advantages
|
Reference distribution
|
Energy efficiency from charge recycle.
|
Sensing margin
|
Sensing margin &
Speed
|
Sensing margin &
Speed
|
Disadvantages
|
Multiple reference cells
|
Operation complexity &
Multiple operation steps
|
Improvement margin of only one state (LRS)
|
Two reference cells & Multiple operation steps
|
Margin imbalance
|
V. CONCLUSIONS
This paper proposes a novel current sense amplifier with dynamic reference (DR-CSA)
for RRAM. The proposed DR-CSA scheme enhances the sensing margin by adjusting the
load current to be copied using the data-dependent early-stage voltage change in the
accessed column. The proposed DR-CSA improves the reliability of RRAM sensing across
a wide range of RRAM resistance variations. Comprehensive simulation results demonstrate
the robustness of the proposed DR-CSA compared to other state-of-the-art CSAs. Additionally,
DR-CSA achieves faster sensing times and higher energy efficiency when compared to
DS-CSA and TS-CSA. In this work, a reference-sharing structure was implemented, sharing
one reference column for multiple sense amplifiers without requiring an additional
reference array. Therefore, the proposed DR-CSA can be applied to various resistive
memories requiring reliable and faster sensing with high robustness.
ACKNOWLEDGMENTS
This research is supported by the Ministry of Education, Singapore, under AcRF
Tier-2 (MOE-T2EP50221-0001). This work is partially supported by Programmatic grant
no. A18A6b0057 (SpOT-Lite), Singapore RIE 2020, AME domain.
References
Khwa, W.-S., et al., “Emerging NVM circuit techniques and implementations for energy-efficient
systems, in Beyond-CMOS Technologies for Next Generation Computer Design”. Springer,
2019, pp. 85-132.
Meena, J.S., et al., “Overview of emerging non-volatile memory technologies.” Nanoscale
research letters, Sep. 2014, pp. 1-33.
Niu, D., et al. “Low power multi-level-cell resistive memory design with incomplete
data mapping.” in IEEE International Conference on Computer Design (ICCD), Nov. 2013,
pp. 131-137.
Chang, M.-F., et al., “A low-voltage bulk-drain-driven read scheme for sub-0.5 V 4
Mb 65 nm logic-process compatible embedded resistive RAM (ReRAM) macro.” IEEE J. Solid-State
Circuits, vol. 48, pp. 2250-2259, Sep. 2013.
Chang, M.-F., et al., “Low VDDmin Swing-Sample-and-Couple Sense Amplifier and Energy-Efficient
Self-Boost-Write-Termination Scheme for Embedded ReRAM Macros Against Resistance and
Switch-Time Variations”. IEEE J, of Solid-State Circuits, vol. 50, pp. 2786-2795,
Sep. 2015.
Lee, A., et al., “A ReRAM-based non-volatile flip-flop with self-write-termination
scheme for frequent-off fast-wake-up non-volatile processors.” IEEE J. Solid-State
Circuits, vol. 52, pp. 2194-2207, Aug. 2017.
G. Murali, X. Sun, S. Yu and S. K. Lim, "Heterogeneous Mixed-Signal Monolithic 3-D
In-Memory Computing Using Resistive RAM," in IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 29, no. 2, pp. 386-396, Feb. 2021,
Wei, Song-Tao, et al. "Trends and challenges in the circuit and macro of RRAM-based
computing-in-memory systems." Chip, vol. 1, no. 1, pp. 1-11, Mar 2022.
Y. Lu, V. L. Le and T. T.-H. Kim, “A 184-μW error-tolerant real-time hand gesture
recognition system with hybrid tiny classifiers utilizing edge CNN,” IEEE J. of Solid-State
Circuits, vol. 58, no. 2, pp. 530-542, Feb. 2023.
Y. Lu, Z. Li, Y. Chen and T. T.-H. Kim, “A 181µW real-time 3-D hand gesture recognition
system based on bi-directional convolution and computing-efficient feature clustering,”
in IEEE Custom Integrated Circuits Conference, Apr. 2022, pp. 1-2.
Lee, H., et al. “Evidence and solution of over-RESET problem for HfO x based resistive
memory with sub-ns switching speed and high endurance”. IEEE International Electron
Devices Meeting (IEDM). Dec. 2010, pp. 17-19.
Gao, B., et al. “Oxide-based RRAM switching mechanism: A new ion-transport-recombination
model”. in IEEE International Electron Devices Meeting(IEMD). Dec. 2008, pp. 1-4.
Kim, J., et al., “A novel sensing circuit for deep submicron spin transfer torque
MRAM (STT-MRAM)”. IEEE Transactions on very large scale integration (VLSI) systems,
vol. 20, no 1, pp. 181-186, Jan. 2012.
Na, T., et al., “A double-sensing-margin offset-canceling dual-stage sensing circuit
for resistive non-volatile memory”. IEEE Transactions on Circuits and Systems II:
Express Briefs, vol. 62, no. 12, pp. 1109-1113, Dec. 2015.
Chang, M.-F, et al,. ”An offset-tolerant current-sampling-based sense amplifier fir
Sub-100 nA-cell-current non-volatile memory”. IEEE J. Solid-State Circuit, pp. 206-208,
Feb. 2011.
Xue, C.-X., et al., “Embedded 1-Mb ReRAM-based computing-in-memory macro with multibit
input and weight for CNN-based AI edge processors”. IEEE J. of Solid-State Circuits,
vol. 55, no. 1, pp. 203-215, Jan. 2019.
KIM, Chankyung, et al. “7.4 A covalent-bonded cross-coupled current-mode sense amplifier
for STT-MRAM with 1T1MTJ common source-line structure array”. In: 2015 IEEE International
Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers. IEEE, Feb. 2015.
pp. 1-3.
Schemes, R., et al. "A high-Speed 7. 2-ns read-Write random access 4-Mb embedded resistive
RAM (ReRAM) macro." IEEE J. of Solid-State Circuits, vol. 48, no. 3, pp. 878-891,
March. 2013.
L. Lu, et al. ReRAM device and circuit co-design challenges in nano-scale CMOS technology.
In IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). IEEE, Dec. 2020.
pp. 213-216.
Y. Chen, L. Lu, B. Kim, and T. T.-H. Kim, “Reconfigurable 2T2R ReRAM architecture
for versatile data storage and computing in memory,” IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 28, no. 12, Dec. 2020, pp. 2636-2649.
Yang, J. A., et al.” A 14nm-FinFET 1Mb Embedded 1T1R RRAM with a 0.022µm2 Cell Size
Using Self-Adaptive Delayed Termination and Multi-Cell Reference," IEEE International
Solid- State Circuits Conference (ISSCC), Feb. 2021, pp. 336-338.
Y-C. Chiu., et al.” A 22nm 4Mb STT-MRAM Data-Encrypted Near-Memory Computation Macro
with a 192GB/s Read-and-Decryption Bandwidth and 25.1-55.1TOPS/W 8b MAC for AI Operations,"
IEEE International Solid- State Circuits Conference (ISSCC), Feb. 2022, pp. 178-180.
T. Shimoi et al., "A 22nm 32Mb Embedded STT-MRAM Macro Achieving 5.9ns Random Read
Access and 5.8MB/s Write Throughput at up to Tj of 150 °C," IEEE Trans. Very Large
Scale Integr. (VLSI) Systems., June. 2022, pp. 134-135.
Y. -C. Chiu., et al. "A 22nm 8Mb STT-MRAM Near-Memory-Computing Macro with 8b-Precision
and 46.4-160.1TOPS/W for Edge-AI Devices," IEEE International Solid- State Circuits
Conference (ISSCC), Feb. 2023, pp. 496-498.
Byung-Kwon An received the B.S. degree from the Department of Electronics Engineering,
Kwangwoon University, Seoul, South Korea, in 2017, and the M.S. degree in circuits
for PCM memories and OTS device from Hanyang University, Seoul, in 2020. He is currently
pursuing the Ph.D. degree in design circuits for RRAM memories circuits with Nanyang
Technological University, Singapore. His research interests include emerging memory
circuit design, in-memory computing.
Xueyong Zhang received the B.S. degree in applied physics from Nantong University,
China, in 2011, the M.E. degree in microelectronics and solidstate electronics from
Southeast University, China, in 2014, and the Ph.D. degree from Nanyang Technological
University (NTU), Singapore, in 2022. From 2014 to 2016, he was a Design Engineer
at Silergy Corporation, where he was involved in power management IC design. From
2016 to 2022, he was a Research Associate and a Research Fellow at NTU. He is currently
a Staff Engineer with Huawei International Pte Ltd., Singapore. His current research
interests include low power analog IC, neuromorphic VLSI, in-memory computing design,
and data converters.
Anh Tuan Do (Member, IEEE)
Anh Tuan Do (Member, IEEE) received the B.S. and Ph.D. degrees from Nanyang Technological
University (NTU), Singapore, in 2007 and 2010, respectively. He joined the Digital
IC Design Group, Institute of Microelectronics (IME), A*STAR, Singapore, in 2015.
He was a Research Fellow with VIRTUS, IC Design Centre of Excellence, NTU, from 2010
to 2015. His research interests include AI hardware, neuromorphic computing, low-power,
low-leakage, variation-tolerant digital circuits, emerging memory, SoC for edge computing,
and biomedical circuits and systems. He was a recipient of the best paper award from
SOCC 2012 and the second prize and best presentation award in the innovation contest
from the International Ph.D. Student Workshop 2007, National University. He served
as a reviewer for several IEEE journals and conferences, including the TCAS I: REGULAR
PAPERS, the TCAS II: EXPRESS BRIEFS, and the VLSI SYSTEMS.
Tony Tae-Hyoung Kim (Senior Member, IEEE)
Tony Tae-Hyoung Kim (Senior Member, IEEE) received the B.S. and M.S. degrees in
electrical engineering from Korea University, Seoul, South Korea, in 1999 and 2001,
respectively, and the Ph.D. degree in electrical and computer engineering from the
University of Minnesota, Minneapolis, MN, USA, in 2009. From 2001 to 2005, he was
at Samsung Electronics where he performed research on the design of high-speed SRAM
memories, clock generators, and IO interface circuits. From 2007 to 2009 summer, he
was at the IBM T. J. Watson Research Center and Broadcom Corporation where he performed
research on isolated NBTI/PBTI measurement circuits and SRAM mismatch measurement
test structure, and battery-backed memory design, respectively. In November 2009,
he joined Nanyang Technological University, where he is currently an Associate Professor.
He is an author/coauthor of around 190 journals and conference papers and holds 17
U.S. and Korean patents. His current research interests include in-memory computing
for edge computing, emerging memory circuit design, energy-efficient circuits and
systems for the IoT and wearable devices, variation and aging tolerant circuits and
systems, and circuit techniques for 3-D ICs. He was a Technical Committee Member of
various conferences, such as IEEE Asian Solid-State Circuits Conference (A-SSCC),
IEEE International Symposium on Circuits and Systems (ISCAS), IEEE/ACM International
Symposium on Low Power Electronics and Design (ISLPED). He is an IEEE SSCS Distinguished
Lecturer. He was the Chair of IEEE SSCS Singapore Chapter from 2015 to 2016 and is
the Chair of IEEE CASS VSA-TC. He received the 1999 Samsung Humantec Thesis Award
(Silver Prize), the 2001 Samsung Humantec Thesis Award (Honor Prize), the 2005 ETRI
Journal Paper of the Year Award, the 2008 Samsung Humantec Thesis Award (Bronze Prize),
the 2008 IEEE DAC/ISSCC Student Design Contest Award, the 2008 AMD/CICC Student Scholarship
Award, the 2008 Departmental Research Fellowship from University of Minnesota, the
Best Paper Award from 2011 and 2014 ISOCC, the Best Demo Award from 2016 IEEE APCCAS,
the International Low Power Design Contest Award from 2016 IEEE/ACM ISLPED, and the
Best Paper Award from 2021 ICCE-Asia. He serves as an Associate Editor of IEEE TRANSACTIONS
ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, IEEE ACCESS, Frontiers in Electronics,
and IEIE Journal of Semiconductor Technology and Science (JSTS).