Mobile QR Code QR CODE

  1. (School of Electrical Electronic Engineering, Nanyang Technological University, Singapore 639798)
  2. (IC-Design Department, Institute of Microeletronics(IME), A*STAR, Singapore 138634)



Resistive random-access memory (RRAM), dynamic reference current sense amplifier (DR-CSA), high resistance state (HRS), low resistance state (LRS), R-ratio (R$_{HRS}$/R$_{LRS}$), sensing margin

I. INTRODUCTION

The increasing volume of data generated and required by modern applications, including data centers, IoT devices, mobile electronics, and low-power image processors, has significantly heightened the demand for memory solutions that are high-density, cost-effective, and energy-efficient [1, 7-10]. In the past, traditional hard disks were widespread but proved inadequate in meeting these demanding requirements, ultimately leading to the widespread adoption of flash memory. Nonetheless, flash memory's limited scaling and endurance, stemming from the inherent constraints of tunneling-based write mechanism, present a substantial challenge [2,3]. Considering these challenges, Resistive Random Access Memory (RRAM) has arisen as a compelling alternative and a front-runner in non-volatile memory. RRAM, renowned for its non-volatility, substantial resistance ratios (R$_{HRS}$/R$_{LRS}$), impressive endurance, minimal read/write latency, compatibility with CMOS processes, and low supply voltage requirements, is particularly well-equipped to meet the progressively demanding memory needs of modern technologically advanced applications [1-6]. An RRAM device comprises two terminals and features three layers of metal-insulator-metal (MIM) configuration. The insulator layer is responsible for the reversible electric-field-induced resistance switching caused by conductive filaments, as depicted in Fig 1(a) [2,11,12]. The formation of filaments results in a low resistance state (LRS), while the rupturing of the filaments results in a high resistance state (HRS). Typically, an RRAM cell consists of one transistor and one RRAM device (1T1R) [2]. The transistor functions as a selector to control access to the bit cell.

Despite RRAM's numerous advantages, significant challenges still need to be addressed, such as resistance variations during fabrication and resistance drift during operation. As illustrated in Fig. 1(b), resistance variations and drift can degrade the R-ratios (R$_{HRS}$/R$_{LRS}$). Additionally, the offset current affects the sensing performance. The offset is impacted by the limited sensing margin resulting from resistance degradation, which can lead to read errors due to reduced sensing margins [4-6]. Therefore, several sensing schemes have been developed for RRAM to enhance the sensing margin [14-18]. For instance, current sense amplifiers such as the two-step scheme (DS-CSA), three-step scheme (TS-CSA), and covalent-bonded scheme (CB-CSA) configurations have been reported to enhance the sensing margin [14,16,17]. Nevertheless, the multi-step sense amplifiers (DS-CSA and TS-CSA) require significant delays due to multiple operational phases [14,16], and the covalent-bonded (SB-CSA) scheme demands an additional reference array to generate two distinct reference currents [17]. This work introduces a current sense amplifier that employs a dynamic reference scheme, enhancing both variation tolerance and sensing speed. The proposed sense amplifier improved the sensing margin (|V$_{CELL}$-V$_{REF}$|) by a factor of 4 x and decreased the sensing time and read power by 53% and 30%, respectively, compared to the conventional CSA scheme [13]. The rest of this paper is organized as follows. Section 2 introduces various RRAM sense amplifiers and their limitations. Section 3 explains the proposed DR-CSA. Simulation results and comparison findings are presented in Section 4, followed by conclusions in Section 5.

Fig. 1. (a) 1T1R structure and equivalent circuit; (b) impact of RRAM resistance variations on the R-ratio.
../../Resources/ieie/JSTS.2024.24.3.226/fig1.png

II. REVIEW OF STATE-OF-THE-ART SENSE AMPLIFIERS

1. Conventional Current-mirror-based Sense Amplifier (CSA).

Fig. 2 illustrates the conventional current-mirror-based current sense amplifier (CSA) [13]. This scheme enables the comparison of cell current and reference current using a current mirror load. A reference current (I$_{REF}$) is generated by setting the median current between HRS and LRS (i.e., I$_{REF}$ = (I$_{HRS}$ + I$_{LRS}$) / 2). When a read operation starts, both V$_{CELL}$ and V$_{REF}$ are generated simultaneously. These processes occur at both the cell node and the reference node. V$_{CELL}$ is created based on the current difference between I$_{CELL}$ and I$_{REF}$, while V$_{REF}$ is generated using the I$_{REF}$. The sensing margin, which represents the difference between V$_{CELL}$ and V$_{REF}$ for the comparator, can be written as follows.

(1)
(|V$_{SA\_CELL}$-V$_{SA\_REF}$|) = R$_{O}$ (|I$_{CELL}$ – I$_{REF}$|)

Here, R$_{O}$ is the output resistance of the PMOS load. However, resistance fluctuations within R$_{CELL}$ and R$_{REF}$ reduce both the sensing margin and the sensing window. Furthermore, the sensing margin is also affected by the offset voltage in the comparator input devices, the current mirror, and the clamping devices [13,14]. Therefore, the conventional CSA is not robust enough for RRAM with a smaller R-ratio caused by RRAM resistance variations. Three alternative CSAs, namely the two-step (DS-CSA), three-step (TS-CSA), and covalent-boned (CB-CSA) current sense amplifiers, were reported to address the above challenges [14,16,17].

Fig. 2. Schematics of conventional current-mirror-based CSA [13].
../../Resources/ieie/JSTS.2024.24.3.226/fig2.png

2. Two-step CSA (DS-CSA)

Fig. 3 illustrates the two-step CSA (DS-CSA) for sensing margin improvement over the conventional CSA [14]. This scheme alters the current paths using two-step switches and independently develops the voltage at V$_{CELL}$ and V$_{REF}$ of the comparator input nodes. In the first step (SS1 = ‘high’), highlighted in red, EQ is activated for a short period. Subsequently, currents flow in their regular directions, as shown in Fig. 3 (Left). This step results in developing the sensing voltage (i.e., 1st (|I$_{CELL}$ – I$_{REF}$|)) at the first input node of the comparator for V$_{CELL}$. In the second step (SS2 = ‘high’), highlighted in blue, the current path alters like in Fig. 3 (Right), generating V$_{REF}$ (i.e., 2st (|I$_{REF}$ – $_{\mathrm{ICELL}}$|)). As a result, this DS-CSA compares V$_{CELL}$ and V$_{REF}$, amplifying the sensing margin using two steps.

Fig. 3. Schematics of DS-CSA [14].
../../Resources/ieie/JSTS.2024.24.3.226/fig3.png

3. Three-step CSA (TS-CSA)

Fig. 4 illustrates the three-step CSA (TS-CSA) [16]. Similar to the DS-CSA, the TS-CSA also employs multiple steps involving switches and capacitor coupling to enhance the sensing margin. At the beginning of a sensing operation, distinct overdrive voltages (V$_{OV}$) are produced for the cell current (i.e., I$_{CELL}$) and the reference current (i.e., I$_{REF}$) using the current paths in load 1. These overdrive voltages (V$_{OV\_ ICELL}$, V$_{OV\_ IREF}$) used for the current generation are subsequently duplicated through capacitive coupling from load 1 to the current paths in load 2. As the transistor sizes in load 2 are twice those in load 1, the increased current (i.e., 2I$_{CELL}$, 2I$_{REF}$) from this load enables to result in a larger current difference compared to the normal currents (i.e., I$_{CELL}$, I$_{REF}$) at the comparator. The following next provides a detailed explanation. TS-CSA includes three major behaviors: 1) threshold-voltage sampling, 2) overdrive-voltage sampling and coupling, and 3) current-difference amplification.

In standby mode (DSD, CHD, SW3, 4 = on), as load 1 and load 2 consist of the diode-connected by switches, the gates and drains of load 1 and load 2 are set to ``0'' (i.e., D1, D2, D3, D4 nodes = 0V).

In the first step (DSD, CHD = off, SW3, 4=on), the threshold voltages of the diode-connected PMOS transistors are generated in loads 1 and 2, respectively. As a result of this step, V$_{DD}$ - V$_{TH}$ is stored and applied to their gates and drains (i.e., D1, D2, D3, D4 nodes = V$_{DD}$ - V$_{TH}$).

In the second step (CHD, SW1,3 = on), the WL and CLAMP are activated, causing both I$_{CELL}$ and I$_{REF}$ to flow through the PMOS transistors of load 1 in red color. Due to the current-sampling behavior [15,16], the sampled voltages are produced based on the cell current (i.e., I$_{CELL}$) and the reference current (i.e., I$_{REF}$) at the gates and drains of load 1 (i.e., D1 node = V$_{DD}$ - V$_{TH}$ - V$_{OV\_ ICELL}$ and D2 node = V$_{DD}$ - V$_{TH}$ - V$_{OV\_ IREF}$).

Simultaneously, by coupling overdrive voltages from both D1 node and D2 node to G1 node and G2 node (i.e., V$_{OV\_ ICELL}$, V$_{OV\_ IREF}$) to load 2, this load generates doubled currents (i.e., 2I$_{CELL}$, 2I$_{REF}$) through double-sized width devices in blue color.

In the third step (SW2 = on), an increased sensing margin for the comparator node can be achieved by amplifying the current difference (i.e., D3 node = |2I$_{REF}$ - I$_{CELL}$|, D4 node = |2I$_{CELL}$ - I$_{REF}$|) between the original currents (i.e., I$_{CELL}$, I$_{REF}$) of load 1 and the doubled currents (i.e., 2I$_{CELL}$, 2I$_{REF}$) of load 2. As a result, this operation amplifies the sensing margin at the comparator. Nevertheless, the two CSA types mentioned above improve the sensing margin compared to the conventional CSA using switches and additional steps [14,16]. However, as previously mentioned, these sense amplifiers require multiple steps to generate voltages for comparison.

Fig. 4. Schematics of TS-CSA [16].
../../Resources/ieie/JSTS.2024.24.3.226/fig4.png

4. Covalent-bonded CSA (CB-CSA)

The covalent-bonded CSA (CB-CSA) is another sensing scheme for improving the sensing margin, as depicted in Fig. 5 [17]. CB-CSA addresses the issues of DS-CAS and TS-CSA through its structural optimization rather than using multiple steps. In CB-CSA, two RRAM cells, an HRS cell, and an LRS cell, are used as reference cells that will be compared with the accessed RRAM cell through two latches, which is called a covalent structure. When a read operation starts, all current components (I$_{REF\_ HRS}$, I$_{CELL}$, I$_{REF\_ LRS}$) flow through the loads. Each latch compares the current of one reference cell (I$_{REF}$ = I$_{HRS}$ or I$_{LRS}$) with a part of the accessed cell current (I$_{CELL}$). While all the currents (I$_{REF\_ HRS}$, I$_{REF\_ LRS}$, and I$_{CELL}$) flow at the same time, the latch with a larger input current difference becomes dominant in comparison. The operation of the other latch is affected by the comparison result of the dominant latch. For example, the right latch is dominant when reading an HRS state. Conversely, the left latch becomes dominant when reading an LRS state. This CSA uses the unit cell reference current for sensing instead of a standard reference current (i.e., I$_{REF}$ = (I$_{HRS}$ + I$_{LRS}$) / 2), resulting in an improved sensing margin compared to the conventional CSA. The current sense amplifier can be shared with the current mirror in the reference column in an array [18]. However, CB-CSA requires more area due to the additional reference array. The covalent structure also creates difficulties when sharing a reference current with multiple current mirrors [17]. In CB-CSA, each sub-array requires two columns with LRS and HRS as references. When reading N bits, the total number of columns for generating reference voltage for each sense amplifier becomes 2 ${\times}$ N.

Fig. 5. Schematics of CB-CSA [17].
../../Resources/ieie/JSTS.2024.24.3.226/fig5.png

III. PROPOSED CURRENT SENSE AMPLIFIER WITH DYNAMIC REFERENCE(DR-CSA)

This work proposes a current sense amplifier with dynamic reference (DR-CSA) to enhance the sensing margin by adjusting the reference current based on the cell state. Fig. 6(a) depicts the proposed sense amplifier. It comprises a modified conventional CSA and the proposed dynamic reference controller (DRC). The DRC is connected between the comparator input nodes and is activated by the enable signal (EN) during RRAM sensing. V$_{REF}$ and V$_{CELL}$ are precharged to V$_{DD}$/2 by the PMOS load and the equalizing PMOS transistors. Fig. 6(b) illustrates the schematic of the DRC, which comprises a capacitor, basic logic gates, and two PMOS switches. The DRC adjusts the reference current (I$_{REF}$) of the current mirror load after the sensing capacitor (C1) detects the early-stage change at V$_{CELL}$. Further details of the DR-CSA operation are provided below. In the standby mode (EN = ‘0’) in Fig. 6(b-top), V$_{CELL}$ and V$_{REF}$ are set to V$_{DD}$/2 by enabling the PMOS-based equalizer through EN. Additionally, P1 shorts the inverter input and output, increasing its gain. Since P3 is turned off, the DRC becomes decoupled from V$_{REF}$ and V$_{CELL}$. In the active mode (EN = '1'), as illustrated in Fig. 6(b-bottom), the DRC block is activated by EN. ``Simultaneously, P1 is turned off, and P3 is turned on to connect the DRC output to V$_{REF}$. When accessing an RRAM cell, C1 detects the change at V$_{CELL}$, and an additional current path is formed by N1 or P2 depending on the amplified signal at ‘SIG’. Fig. 7 shows an example of sensing LRS. As the sensing operation starts, the equalized V$_{CELL}$ and V$_{REF}$ develop an initial voltage difference. Since the cell current is larger than the reference current during the reading LRS (i.e., I$_{LRS}$ > I$_{REF}$), V$_{cell}$ is pulled below V$_{DD}$/2. Simultaneously, this slight voltage drop at V$_{cell}$ is detected by C1 in the DRC block and generates a small voltage drop at the inverter input by capacitive coupling. The small change at the input of the inverter is amplified by the inverter and the NAND2 gate, resulting in 'SIG' being set to '0' (i.e., SIG = 0). As a result, this activation turns on P2, providing an additional pull-up current (I$_{Charge}$). I$_{REF}$ can be rewritten as 'I$_{Load}$ + I$_{Charge}$,' which decreases I$_{load}$. The decreased I$_{Load}$ will be copied to I$_{Mirror}$, lowering V$_{CELL}$ further. As a result, the voltage difference between V$_{REF}$ and V$_{CELL}$ will be enhanced more through the feedback operation in DRC.

Fig. 8 shows the sensing operation when accessing an RRAM in HRS. In contrast to LRS, when reading HRS (i.e., I$_{CELL}$ < I$_{REF}$), V$_{cell}$ is formed slightly above VDD/2. This small voltage rise at V$_{cell}$ is coupled into DRC through C1, leading to an additional pull-down current path through P3 and N1. As a result, I$_{Load}$ increases by the amount of I$_{Discharge}$. Then, the increased I$_{load}$ is copied by the current mirror and raises V$_{CELL}$ to a higher level, improving the sensing margin.

Fig. 6. (a) Proposed current sense amplifier; (b) schematic of dynamic reference controller (DRC).
../../Resources/ieie/JSTS.2024.24.3.226/fig6.png
Fig. 7. Operation of DRC for sensing margin improvement in sensing LRS.
../../Resources/ieie/JSTS.2024.24.3.226/fig7.png
Fig. 8. Operation of DRC for sensing margin improvement in sensing HRS.
../../Resources/ieie/JSTS.2024.24.3.226/fig8.png

IV. SIMULATION RESULT AND COMPARISON

In this work, the 64~kb RRAM assisted with the proposed sense amplifier is designed in 40-nm CMOS technology. The supply voltage is 1.1 V, and the employed RRAM device is modeled in Verilog-A and is based on the HfO$_{\mathrm{X}}$ RRAM stack in [19,20]. Fig. 9 illustrates the simplified architecture of the 64kb RRAM used for validating the proposed DR-CSA. The reference current generator is shared with other sense amplifiers to minimize the area overhead, as shown on the right side of Fig. 9. The designed array consists of 256 ${\times}$ 256 RRAM cells with 32 sub-arrays, using 32 sense amplifiers. Fig 10 shows the DC I-V curve of the RRAM model and the Monte-Carlo simulation result with 1000 samples. The average resistance of HRS (Blue) and LRS (Red) are 950~k${\Omega}$ and 9 k${\Omega}$, respectively. I$_{REF}$, I$_{HRS}$, and I$_{LRS}$ are 1.5~${\mathrm{\mu}}$A, 300 nA, and 2.5 ${\mathrm{\mu}}$A, respectively. Since the coupling capacitor in DRC employs is implemented by a MOSFET, the MOSFET size is carefully designed after considering the the important factor to consider is that the MOS capacitor minimizes area size and the variations caused by the mismatch. In our design, the area of the MOS capacitorthe MOSFET is sized with selected as 0.32~${\mathrm{\mu}}$m$^{2}$. as depicted in Fig. 11 presents the MOS capacitor for different sizes and shows the impact of process variations on the capacitance. The simulated mean and the standard deviation of the capacitance are 5.6~fF and 104~aF, respectively. The normalized variance of the capacitor (${\Delta}$C/C) is 1.83%. Since the proposed DR-CSA added a DRC block utilizing a capacitor, there is an area overhead of approximately 31% compared to the conventional CSA. However, DR-CSA can achieve better sensing performance. Fig. 12 compares the sensing operation result of the proposed CSA and conventional CSA. Enabling the DRC increases the voltage difference between V$_{CELL}$ and V$_{REF}$, enhancing the sensing margin and speed. As shown in Fig. 12(a), when reading LRS, I$_{load}$ decreases, which increases V$_{REF}$ and lowers V$_{CELL}$, improving the sensing margin. Conversely, when reading HRS in Fig. 12(b), I$_{load}$ increases, which lowers V$_{REF}$ and raises V$_{CELL}$. However, in the case of HRS, the diode-connected load operates in saturation mode, constraining the V$_{REF}$ node from dropping V$_{OV\_ IREF}$ [14,16]. Additionally, the limited V$_{OV\_ IREF}$ can impact the rising swing of V$_{CELL}$, leading to margin imbalances. As a result, the sensing margin is increased by DRC in Fig. 13. Also, it is worth highlighting that sensing LRS is more challenging than HRS in conventional CSA due to the smaller margin and larger delay. Therefore, the proposed sensing technique significantly enhances overall sensing performance by addressing the LRS sensing issue, especially for RRAM devices with larger variations. Furthermore, the robustness of the proposed sensing scheme is investigated after considering RRAM variations.

Fig. 9. Architecture of 64kb RRAM.
../../Resources/ieie/JSTS.2024.24.3.226/fig9.png
Fig. 10. Simulated: (a) I-V characteristics; (b) distribution result of the RRAM model in LRS and HRS.
../../Resources/ieie/JSTS.2024.24.3.226/fig10.png
Fig. 11. (a) MOS capacitance for different transistor sizes; (b) distribution and when the MOS size is 0.32 µm2.
../../Resources/ieie/JSTS.2024.24.3.226/fig11.png
Fig. 12. Simulation results comparing the sensing margins of the proposed CSA with the conventional CSA.
../../Resources/ieie/JSTS.2024.24.3.226/fig12.png
Fig. 13. Comparison of the proposed CSA with the conventional CSA: sensing margin LRS and HRS.
../../Resources/ieie/JSTS.2024.24.3.226/fig13.png

Fig. 14 shows the simulated current and voltage of the conventional CSA over the R-ratio from 20 to 100. The R-ratio degradation is simulated by adjusting the RRAM model values. Fig. 14(a) shows that the current margin decreases from 1~${\mathrm{\mu}}$A to 250~nA for LRS and from 1.2~${\mathrm{\mu}}$A to 660~nA for HRS as the R-ratio degrades from 20 to 100. The corresponding voltage margin is shown in Fig. 14(b). It can be seen that the current and voltage margins for sensing LRS are more vulnerable to R-ratio degradation compared with those for sensing HRS. Remarkably, a 99% voltage margin degradation was observed in the sensing LRS, as shown in Fig. 14(b). The degradation in the sensing margin also impacts sensing speed. As the R-ratio decreases, the delay in sensing LRS increases significantly, in contrast to the delay in sensing HRS. Fig. 15 presents the simulated margins of the proposed DR-CSA and conventional CSA at various R-ratios. DR-CSA enhances the margin of both LRS and HRS.

As shown in Fig. 15(a), DR-CSA has no degradation in the margin (${\approx}$ V$_{DD}$) for LRS with a high swing in V$_{REF}$ (${\approx}$V$_{SS}$) and V$_{CELL}$(${\approx}$V$_{DD}$) since the reduced I$_{load}$ remains consistent despite the decrease in the ratio. As a result, it improves the margin by 4${\times}$~16${\times}$. When reading HRS, DR-CSA demonstrates an increased margin of 20 ~ 25% compared to the conventional CSA in Fig. 15(b).

Fig. 14. Simulated results of the conventional CSA: (a) current; (b) voltage.
../../Resources/ieie/JSTS.2024.24.3.226/fig14.png
Fig. 15. Comparison of the proposed CSA with the conventional CSA at different R-ratios: sensing margin: (a) LRS; (b) HRS.
../../Resources/ieie/JSTS.2024.24.3.226/fig15.png
Fig. 16. Comparison of the proposed CSA with the conventional CSA at different R-ratios: (a) comparator delay; (b) sensing time.
../../Resources/ieie/JSTS.2024.24.3.226/fig16.png

Fig. 16 shows the simulated results of the comparator delay and the overall sensing time. The comparator delay is significantly reduced due to the increased sensing margin. After considering the R-ratio, the simulated comparator delay is around 120~ps, representing a delay reduction of over 90% compared to the conventional CSA in Fig. 16(a). The proposed DR-CSA maintains consistent sensing speed across various R-ratio values. Fig. 16(b) summarizes the comparison of the sensing time, demonstrating that the proposed CSA reduces sensing time by ~90% to ~53% as the R-ratio degrades from 20 to 100. However, as the R-ratio decreases, the conventional CSA experiences an increase in delay time when assessing distinctions with narrow sensing margins. The sensing time delay significantly increases as the R-ratio decreases from 100 to 20 due to a narrow sensing margin of tens of millivolts, as shown in Fig. 14(b). The proposed CSA maintains a consistent average sensing time ranging from 0.84 ns to 1.07~ns. In contrast, in conventional simulation, sensing time increases from 1.95~ns to 10~ns. Furthermore, the proposed CSA enhances energy efficiency associated with the fast sensing time. The energy consumption of the array is illustrated under 1.1 V in Fig. 17(a). The proposed DRC scheme reduces the total array energy and SA energy by up to 32% and 63%, respectively. Fig. 17(b) presents the energy breakdown of the RRAM. This result indicates that not only is the total energy reduced, but also the energy portion of the sense amplifier can be decreased from 8.4% to 4.3%. Therefore, an improvement in the energy breakdown is observed. Fig. 18 presents the Monte-Carlo simulation results with 1000 samples to assess the robustness of the proposed CSA scheme. Fig. 18(a) and (b) depict the threshold voltage statistics (V$_{TH}$) included in the simulation results. The mean value (${\mu}$) of V$_{TH}$ is 0.7 V for NMOS and -0.67 V for PMOS. The standard deviation (${\sigma}$) is approximately 10 ~ 11 mV. In Fig. 18(c) and (d), the sensing margins for LRS and HRS are presented. In LRS, the sensing margin exhibits a mean value(${\mu}$) of 1.05 V and a standard deviation (${\sigma}$) of 5~mV. For HRS, the sensing margin has ${\mu}$ = 0.55 V and ${\sigma}$ = 20~mV when employing 3${\sigma}$ variations. Fig. 18(e) and (f) display the comparator delay and the sensing time. The access time is presented by ${\mu}$ = 0.86~ns and ${\sigma}$ = 7~ps. As shown in Fig. 19 and 20, the simulation result for the proposed CSA using fast(F) and the slow(S) corner models of MOSFET are conducted at two different of temperatures: 27 $^{\circ}$C and 100 $^{\circ}$C, to verify process variation. Fig. 19 presents a simulation waveform with the corner at 27 $^{\circ}$C. Enhancing the comparator margin across all corner models shows improved performance, although it may cause a slight delay at the SS corner in the worst-case scenario due to the requirement of generating cell current for pull-down at DRC. However, as shown in Fig. 20, DRC has led to an overall enhancement in margin performance compared to the conventional CSA margin on TT corner, even under temperature conditions of 27$^{\circ}$C and 100$^{\circ}$C in the corners. Consequently, these results demonstrate the overall robustness of performance across variations in distribution by implementing DRC.

Fig. 17. Comparison of the proposed CSA with the conventional CSA: (a) energy; (a) energy breakdown at VDD = 1.1 V.
../../Resources/ieie/JSTS.2024.24.3.226/fig17.png
Fig. 18. Monte Carlo simulation results: (a) Vth of NMOS; (b) Vth of PMOS; (c) margins in LRS; (d) margins in HRS; (e) comparator delay; (f) sensing time.
../../Resources/ieie/JSTS.2024.24.3.226/fig18.png
Fig. 19. Simulation waveforms of the proposed DR-CSA at process corners.
../../Resources/ieie/JSTS.2024.24.3.226/fig19.png
Fig. 20. Sensing margins of the proposed DR-CSA at process corners.
../../Resources/ieie/JSTS.2024.24.3.226/fig20.png

Fig. 21 and 22 compare various CSAs in a 40-nm process by standardizing the current margin for comparison at VDD = 1.1 V and 0.9 V. In Fig. 21(a), the DR-CSA achieves a sensing time of 0.84 ns, surpassing the TM-CSA and DS-CSA by up to 0.8~ns and 0.96~ns, respectively. As explained in Part 2 above, the CB-CSA performs direct comparisons without requiring an additional latch comparator, resulting in a slightly faster sensing time than the DR-CSA (-0.16~ns). At a lower VDD (0.9V), the DR-CSA exhibits faster sensing speeds than the TM-CSA (+5~ns) and DS-CSA (+8.1~ns), as shown in Fig. 21(b). Fig. 22 presents the sub-array reading energy results for comparing various CSAs. As shown in Fig. 22(a), DR-CSA exhibits the lowest read energy consumption (235.16 fJ) among the compared CSAs, achieving reductions of 17%, 9.8%, and 3.8% compared to TM-CSA (282.85 fJ), DS-CSA (260.722 fJ), and CB-CSA (244.40 fJ), respectively. Moreover, at low VDD, DR-CSA can reduce sensing energy by more than 10%, resulting in read energy reductions of 21%, 13%, and 2.6% compared to TS-CSA, DS-CSA, and CB-CSA, as depicted in Fig. 22(b). Fig. 23 presents the normalized variance of sensing time from the distributions with the proposed CSA and various CSAs in (a)-(e). To assess robustness, this value is obtained by dividing the standard deviation by the mean value (i.e., ${\sigma}$/${\mu}$) based on a Monte-Carlo simulation with 1000 samples. As shown in Fig. 24, the proposed CSA exhibits a variance of less than 1% in comparison. This result demonstrates that the proposed CSA can enhance the robustness of the distribution.

Fig. 21. Sensing time at: (a) VDD = 1.1 V; (b) VDD = 0.9 V.
../../Resources/ieie/JSTS.2024.24.3.226/fig21.png
Fig. 22. Read energy per bit at (a) VDD = 1.1 V; (b) VDD = 0.9 V.
../../Resources/ieie/JSTS.2024.24.3.226/fig22.png
Fig. 23. Distribution results of Sensing time: (a) conventional CSA; (b) TS-CSA; (c) CB-CSA; (d) DS-CSA; (e) DR-CSA.
../../Resources/ieie/JSTS.2024.24.3.226/fig23.png
Fig. 24. Comparison of normalized variance for Sensing time.
../../Resources/ieie/JSTS.2024.24.3.226/fig24.png

Table 1 summarizes a comprehensive performance comparison between the proposed CSA and other CSAs. It includes simulation results evaluating competitive merits and addressing key challenges. Table 1 is organized into two sections. The first section presents performance data from literature papers, while the second section presents simulation results compared under standard conditions utilizing a 2k sub-array and maintaining the same current margin in a 40-nm CMOS technology. In this table, the voltage margin (V$_{MARGIN}$) exhibits slight differences in the HRS and LRS of CSAs due to different current loads. Although the proposed DR-CSA shows the imbalance of sensing margin where LRS is better than HRS, it improves sensing time by 2.3${\times}$, 2.1${\times}$, and 1.9${\times}$ compared to the conventional CSA [13], DS-CSA [14], and TS-CSA [16]. Furthermore, the proposed DR-CSA enhances energy efficiency by 17%, 9.8%, and 3.8% compared to DS-CSA, TS-CSA, and CB-CSA [17]. Table 2 provides the specifications of the proposed DR-CSA and other state-of-the-art SAs. Compared to other sensing methodologies, the proposed scheme uses simple operation steps to enhance sensing margins and robustness.

Table 1. Performance with the Comparison

JSSC'20[16]

TCSAII'15[14]

ISSCC'15[17]

This Work

Process

55 nm

45 nm

Sub-20 nm

40 nm

Supply voltage

1 V

1 V

1.5 V

1.1 V

Iread

15 µA

2 µA

1 µA

Sensing time

3.16 ns (Test Measure)

3.4 ns (Test Simulation)

9.1 ns (Test Measure)

0.84 ns (Test Simulation)

Array (Sub-array)

1 Mb 128k(256X512)

32 Mb (Unknown)

8 Mb 0.5M(512x1028)

64 Kb 2k(256X8)

*: Simulation performance with the same array, Iread, and 40 nm CMOS

VMARGIN*

500 mV(LRS) 483 mV(HRS)

600 mV(LRS) 700 mV(HRS)

590 mV(LRS) 550 mV(HRS)

1050 mV (LRS) 540 mV (HRS)

Sensing time*

1.64n

1.8n

0.64n

0.84n

Read energy bit*

44.922 fJ(SA) 282.85 fJ (Sub-array)

22.296 fJ(SA) 260.722 fJ (Sub-array)

10.77 fJ(SA) 244.40 fJ (Sub-array)

10.17 fJ(SA) 235.16 fJ (Sub-array)

Merit

Reference share / Margin balance

Reference share / Margin balance

Sensing speed /

Margin balance

Reference share /

Sensing speed

Drawback

Sensing speed

Sensing speed

Reference share

Margin imbalance

*Sensing time: measure from starting WL on to generate a digital output.

*VMARGIN: (VCELL-VREF).

*SA: only sense amplifier power, Sub-array*: The power with sub array.

Set current: LRS 2.5 uA, REF 1.5 uA, 300 nA

Table 2. Comparison with State-of-the-Art Works

ISSCC'21[21]

ISSCC'22[22]

VLSI'22[23]

ISSCC'23[24]

This Work

Sensing scheme

Multi-Cell reference SA

Charge recycling-VSA

Boosted Cross couple-CSA

Differential charge accumulation-VSA

Dynamic reference-CSA

Concept

Avoiding reference collapse

Recycle charging

Boosting internal node

Using two reference cells

Adjusting reference current

Process

14 nm

18 nm

22 nm

22 nm

40 nm

Supply Voltage

0.85

0.8 V

0.8 V

0.8 V

1.1 V

Advantages

Reference distribution

Energy efficiency from charge recycle.

Sensing margin

Sensing margin &

Speed

Sensing margin &

Speed

Disadvantages

Multiple reference cells

Operation complexity &

Multiple operation steps

Improvement margin of only one state (LRS)

Two reference cells & Multiple operation steps

Margin imbalance

V. CONCLUSIONS

This paper proposes a novel current sense amplifier with dynamic reference (DR-CSA) for RRAM. The proposed DR-CSA scheme enhances the sensing margin by adjusting the load current to be copied using the data-dependent early-stage voltage change in the accessed column. The proposed DR-CSA improves the reliability of RRAM sensing across a wide range of RRAM resistance variations. Comprehensive simulation results demonstrate the robustness of the proposed DR-CSA compared to other state-of-the-art CSAs. Additionally, DR-CSA achieves faster sensing times and higher energy efficiency when compared to DS-CSA and TS-CSA. In this work, a reference-sharing structure was implemented, sharing one reference column for multiple sense amplifiers without requiring an additional reference array. Therefore, the proposed DR-CSA can be applied to various resistive memories requiring reliable and faster sensing with high robustness.

ACKNOWLEDGMENTS

This research is supported by the Ministry of Education, Singapore, under AcRF Tier-2 (MOE-T2EP50221-0001). This work is partially supported by Programmatic grant no. A18A6b0057 (SpOT-Lite), Singapore RIE 2020, AME domain.

References

1 
Khwa, W.-S., et al., “Emerging NVM circuit techniques and implementations for energy-efficient systems, in Beyond-CMOS Technologies for Next Generation Computer Design”. Springer, 2019, pp. 85-132.DOI
2 
Meena, J.S., et al., “Overview of emerging non-volatile memory technologies.” Nanoscale research letters, Sep. 2014, pp. 1-33.DOI
3 
Niu, D., et al. “Low power multi-level-cell resistive memory design with incomplete data mapping.” in IEEE International Conference on Computer Design (ICCD), Nov. 2013, pp. 131-137.DOI
4 
Chang, M.-F., et al., “A low-voltage bulk-drain-driven read scheme for sub-0.5 V 4 Mb 65 nm logic-process compatible embedded resistive RAM (ReRAM) macro.” IEEE J. Solid-State Circuits, vol. 48, pp. 2250-2259, Sep. 2013.DOI
5 
Chang, M.-F., et al., “Low VDDmin Swing-Sample-and-Couple Sense Amplifier and Energy-Efficient Self-Boost-Write-Termination Scheme for Embedded ReRAM Macros Against Resistance and Switch-Time Variations”. IEEE J, of Solid-State Circuits, vol. 50, pp. 2786-2795, Sep. 2015.DOI
6 
Lee, A., et al., “A ReRAM-based non-volatile flip-flop with self-write-termination scheme for frequent-off fast-wake-up non-volatile processors.” IEEE J. Solid-State Circuits, vol. 52, pp. 2194-2207, Aug. 2017.DOI
7 
G. Murali, X. Sun, S. Yu and S. K. Lim, "Heterogeneous Mixed-Signal Monolithic 3-D In-Memory Computing Using Resistive RAM," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, no. 2, pp. 386-396, Feb. 2021,DOI
8 
Wei, Song-Tao, et al. "Trends and challenges in the circuit and macro of RRAM-based computing-in-memory systems." Chip, vol. 1, no. 1, pp. 1-11, Mar 2022.DOI
9 
Y. Lu, V. L. Le and T. T.-H. Kim, “A 184-μW error-tolerant real-time hand gesture recognition system with hybrid tiny classifiers utilizing edge CNN,” IEEE J. of Solid-State Circuits, vol. 58, no. 2, pp. 530-542, Feb. 2023.DOI
10 
Y. Lu, Z. Li, Y. Chen and T. T.-H. Kim, “A 181µW real-time 3-D hand gesture recognition system based on bi-directional convolution and computing-efficient feature clustering,” in IEEE Custom Integrated Circuits Conference, Apr. 2022, pp. 1-2.DOI
11 
Lee, H., et al. “Evidence and solution of over-RESET problem for HfO x based resistive memory with sub-ns switching speed and high endurance”. IEEE International Electron Devices Meeting (IEDM). Dec. 2010, pp. 17-19.DOI
12 
Gao, B., et al. “Oxide-based RRAM switching mechanism: A new ion-transport-recombination model”. in IEEE International Electron Devices Meeting(IEMD). Dec. 2008, pp. 1-4.DOI
13 
Kim, J., et al., “A novel sensing circuit for deep submicron spin transfer torque MRAM (STT-MRAM)”. IEEE Transactions on very large scale integration (VLSI) systems, vol. 20, no 1, pp. 181-186, Jan. 2012.DOI
14 
Na, T., et al., “A double-sensing-margin offset-canceling dual-stage sensing circuit for resistive non-volatile memory”. IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, no. 12, pp. 1109-1113, Dec. 2015.DOI
15 
Chang, M.-F, et al,. ”An offset-tolerant current-sampling-based sense amplifier fir Sub-100 nA-cell-current non-volatile memory”. IEEE J. Solid-State Circuit, pp. 206-208, Feb. 2011.DOI
16 
Xue, C.-X., et al., “Embedded 1-Mb ReRAM-based computing-in-memory macro with multibit input and weight for CNN-based AI edge processors”. IEEE J. of Solid-State Circuits, vol. 55, no. 1, pp. 203-215, Jan. 2019.DOI
17 
KIM, Chankyung, et al. “7.4 A covalent-bonded cross-coupled current-mode sense amplifier for STT-MRAM with 1T1MTJ common source-line structure array”. In: 2015 IEEE International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers. IEEE, Feb. 2015. pp. 1-3.DOI
18 
Schemes, R., et al. "A high-Speed 7. 2-ns read-Write random access 4-Mb embedded resistive RAM (ReRAM) macro." IEEE J. of Solid-State Circuits, vol. 48, no. 3, pp. 878-891, March. 2013.DOI
19 
L. Lu, et al. ReRAM device and circuit co-design challenges in nano-scale CMOS technology. In IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). IEEE, Dec. 2020. pp. 213-216.DOI
20 
Y. Chen, L. Lu, B. Kim, and T. T.-H. Kim, “Reconfigurable 2T2R ReRAM architecture for versatile data storage and computing in memory,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 12, Dec. 2020, pp. 2636-2649.DOI
21 
Yang, J. A., et al.” A 14nm-FinFET 1Mb Embedded 1T1R RRAM with a 0.022µm2 Cell Size Using Self-Adaptive Delayed Termination and Multi-Cell Reference," IEEE International Solid- State Circuits Conference (ISSCC), Feb. 2021, pp. 336-338.URL
22 
Y-C. Chiu., et al.” A 22nm 4Mb STT-MRAM Data-Encrypted Near-Memory Computation Macro with a 192GB/s Read-and-Decryption Bandwidth and 25.1-55.1TOPS/W 8b MAC for AI Operations," IEEE International Solid- State Circuits Conference (ISSCC), Feb. 2022, pp. 178-180.DOI
23 
T. Shimoi et al., "A 22nm 32Mb Embedded STT-MRAM Macro Achieving 5.9ns Random Read Access and 5.8MB/s Write Throughput at up to Tj of 150 °C," IEEE Trans. Very Large Scale Integr. (VLSI) Systems., June. 2022, pp. 134-135.DOI
24 
Y. -C. Chiu., et al. "A 22nm 8Mb STT-MRAM Near-Memory-Computing Macro with 8b-Precision and 46.4-160.1TOPS/W for Edge-AI Devices," IEEE International Solid- State Circuits Conference (ISSCC), Feb. 2023, pp. 496-498.DOI
Byung-Kwon An
../../Resources/ieie/JSTS.2024.24.3.226/au1.png

Byung-Kwon An received the B.S. degree from the Department of Electronics Engineering, Kwangwoon University, Seoul, South Korea, in 2017, and the M.S. degree in circuits for PCM memories and OTS device from Hanyang University, Seoul, in 2020. He is currently pursuing the Ph.D. degree in design circuits for RRAM memories circuits with Nanyang Technological University, Singapore. His research interests include emerging memory circuit design, in-memory computing.

Xueyong Zhang
../../Resources/ieie/JSTS.2024.24.3.226/au2.png

Xueyong Zhang received the B.S. degree in applied physics from Nantong University, China, in 2011, the M.E. degree in microelectronics and solidstate electronics from Southeast University, China, in 2014, and the Ph.D. degree from Nanyang Technological University (NTU), Singapore, in 2022. From 2014 to 2016, he was a Design Engineer at Silergy Corporation, where he was involved in power management IC design. From 2016 to 2022, he was a Research Associate and a Research Fellow at NTU. He is currently a Staff Engineer with Huawei International Pte Ltd., Singapore. His current research interests include low power analog IC, neuromorphic VLSI, in-memory computing design, and data converters.

Anh Tuan Do (Member, IEEE)
../../Resources/ieie/JSTS.2024.24.3.226/au3.png

Anh Tuan Do (Member, IEEE) received the B.S. and Ph.D. degrees from Nanyang Technological University (NTU), Singapore, in 2007 and 2010, respectively. He joined the Digital IC Design Group, Institute of Microelectronics (IME), A*STAR, Singapore, in 2015. He was a Research Fellow with VIRTUS, IC Design Centre of Excellence, NTU, from 2010 to 2015. His research interests include AI hardware, neuromorphic computing, low-power, low-leakage, variation-tolerant digital circuits, emerging memory, SoC for edge computing, and biomedical circuits and systems. He was a recipient of the best paper award from SOCC 2012 and the second prize and best presentation award in the innovation contest from the International Ph.D. Student Workshop 2007, National University. He served as a reviewer for several IEEE journals and conferences, including the TCAS I: REGULAR PAPERS, the TCAS II: EXPRESS BRIEFS, and the VLSI SYSTEMS.

Tony Tae-Hyoung Kim (Senior Member, IEEE)
../../Resources/ieie/JSTS.2024.24.3.226/au4.png

Tony Tae-Hyoung Kim (Senior Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Korea University, Seoul, South Korea, in 1999 and 2001, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Minnesota, Minneapolis, MN, USA, in 2009. From 2001 to 2005, he was at Samsung Electronics where he performed research on the design of high-speed SRAM memories, clock generators, and IO interface circuits. From 2007 to 2009 summer, he was at the IBM T. J. Watson Research Center and Broadcom Corporation where he performed research on isolated NBTI/PBTI measurement circuits and SRAM mismatch measurement test structure, and battery-backed memory design, respectively. In November 2009, he joined Nanyang Technological University, where he is currently an Associate Professor. He is an author/coauthor of around 190 journals and conference papers and holds 17 U.S. and Korean patents. His current research interests include in-memory computing for edge computing, emerging memory circuit design, energy-efficient circuits and systems for the IoT and wearable devices, variation and aging tolerant circuits and systems, and circuit techniques for 3-D ICs. He was a Technical Committee Member of various conferences, such as IEEE Asian Solid-State Circuits Conference (A-SSCC), IEEE International Symposium on Circuits and Systems (ISCAS), IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). He is an IEEE SSCS Distinguished Lecturer. He was the Chair of IEEE SSCS Singapore Chapter from 2015 to 2016 and is the Chair of IEEE CASS VSA-TC. He received the 1999 Samsung Humantec Thesis Award (Silver Prize), the 2001 Samsung Humantec Thesis Award (Honor Prize), the 2005 ETRI Journal Paper of the Year Award, the 2008 Samsung Humantec Thesis Award (Bronze Prize), the 2008 IEEE DAC/ISSCC Student Design Contest Award, the 2008 AMD/CICC Student Scholarship Award, the 2008 Departmental Research Fellowship from University of Minnesota, the Best Paper Award from 2011 and 2014 ISOCC, the Best Demo Award from 2016 IEEE APCCAS, the International Low Power Design Contest Award from 2016 IEEE/ACM ISLPED, and the Best Paper Award from 2021 ICCE-Asia. He serves as an Associate Editor of IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, IEEE ACCESS, Frontiers in Electronics, and IEIE Journal of Semiconductor Technology and Science (JSTS).