YuJangseok1
LeeGeonwoo1
NaTaehui†
-
(Department of EE, Incheon National University, Incheon 22012, Korea)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Charge saving and sharing circuit, in-memory computing, full adder, MRAM
I. INTRODUCTION
Over the past few decades, there has been a significant increase in the volume of
data being processed and stored. One of the most severe bottlenecks in conventional
Von-Neumann computer architectures is the limited data bandwidth between the processor
and memory [1-3]. Furthermore, data transfer between the processor and memory incurs high latency
and energy consumption, which leads to a significant degradation in system performance
and efficiency. This situation has resulted in memory bandwidth limitations, known
as the ``memory wall,'' and increased the data movement overhead and leakage current
[4]. In-memory computing (IMC), an idea proposed several decades ago, aims to address
these challenges by incorporating processing units directly into the memory itself
[5]. The fundamental concept revolves around preprocessing data and providing only intermediate
results to the processor [2]. Such a computer architecture not only reduces data transfer bandwidth and power
overhead but also enhances performance by executing simple logical operations within
the memory [1].
In recent years, the emergence of new non-volatile memories (NVMs), such as resistive
random access memory (RRAM), phase-change random access memory (PRAM), and spin-transfer
torque magnetic random access memory (STT-MRAM), has opened up new possibilities for
efficient implementation of IMC [6]. The resistance-based storage mechanism of these NVM devices offers unique processing
capabilities, enabling energy-efficient logical computing within the memory itself.
In this scenario, logical operations can be performed, and the results can be stored
in a non-volatile format on the memory chip [7]. Among these NVMs, STT-MRAM have garnered significant attention, with various prototype
demonstrations and early commercial products [2]. Extensive research efforts have been dedicated to improving the efficiency of STT-MRAM
at the device, circuit, and architectural levels [6, 8-10]. In this paper, we delve into the exploration of IMC utilizing STT-MRAM.
Numerous STT-MRAM-based IMC approaches have been proposed at the architectural level
[2,11]. The capability to simultaneously activate multiple word lines (WLs) within a memory
array can be leveraged to execute various arithmetic, logic, and vector operations
[12,13]. The concurrent activation of memory cells enables the AND and OR operations in a
single stage by utilizing a pre-charge sense amplifier (PCSA) [11]. Furthermore, a full adder (FA) can also operate by integrating a logic tree into
the PCSA [11]. However, for multi-bit FA, an ``n + 1'' stage configuration is required to perform
an n-bit operation. Although digital circuits like carry-lookahead adders (e.g., Kogge-Stone
adder (KSA), Brent-Kung adder, Sklansky adder) can significantly reduce the number
of stages, they entail significant area overhead and are unsuitable for memory arrays.
Therefore, to minimize the number of stages while minimizing overhead within a memory
array, the utilization of analog circuits is preferred instead of digital circuits.
In this study, we propose a high-performance multi-bit FA that incorporates a charge
saving and sharing (CSS) circuit, which operates in the analog domain [14]. Similar to the carry skip adder, we pre-compute the carry for every 4 bits to enable
parallel computation of the 4-bit sum operation [15]. To compute the carry for every 4 bits, we employ the CSS circuit, while the 4-bit
sum operation is performed using the PCSA with an integrated logic tree [11]. As a result, the proposed method utilizing the CSS circuit successfully reduces
the required number of stages from ``n + 1'' to ``n/4 + 5'' stages, while minimizing
the area overhead.
The remainder of this paper is structured as follows: Section II provides the background
information on STT-MRAM and PCSA; Section III describes the implementation of the
state-of-the-art multi-bit FA and the proposed multi-bit FA using the CSS circuit;
Section IV presents the simulation results; and finally, Section V offers the conclusion.
II. BACKGROUND
1. STT-MRAM
Fig. 1(a) illustrates a magnetic tunnel junction (MTJ), which serves as the fundamental storage
element of STT-MRAM. The MTJ comprises a free layer, a tunnel barrier, and a pinned
layer. Commonly employed materials for the tunnel barrier include AlOx and MgO, while
the free layer is typically composed of CoFeB, Ru, CoFe, PtMn, and similar substances
[16].
Fig. 1(b) demonstrates two states, namely parallel (P) and anti-parallel (AP), which are determined
by the magnetization direction of the free layer. The MTJ can exhibit two resistance
states, attributed to the tunneling magneto-resistance (TMR) effect, depending on
whether it is in the P or AP state [17].
In the case of the P state, it is represented by low resistance (RL), which corresponds to the data ‘1’. On the other hand, the AP state is indicated
by high resistance (RH), representing the data ‘0’. Fig. 1(c) depicts a single bit-cell configuration, known as 1T-1MTJ, in STT-MRAM. During a
write operation, the ‘1’ data can be written by allowing current to flow from the
bit-line (BL) to the source line (SL), while the ‘0’ data can be written by allowing
the current to flow from SL to BL.
Fig. 1. (a) MTJ; (b) Two states of MTJ; (c) 1T-1MTJ bit-cell structure of STT-MRAM.
2. PCSA
The PCSA depicted in Fig. 2 enables the execution of read, AND/OR, carry, and sum operations [11]. The logic tree within the PCSA is utilized specifically for FA operation. According
to Table 1, during all the operations, L0 and L1 maintain a high level, except for sum (i.e.,
FA) operation.
Fig. 2. PCSA with the addition of a logic tree[11].
Table 1. Control signals for read, AND, OR, carry, and sum operations[11]
Operation
|
L0
|
/L1
|
L1
|
/L0
|
L2
|
L3
|
Read
|
1
|
0
|
1
|
0
|
0
|
0
|
AND
|
1
|
0
|
1
|
0
|
1
|
0
|
OR
|
1
|
0
|
1
|
0
|
0
|
1
|
Carry
|
1
|
0
|
1
|
0
|
/CIN
|
CIN
|
Sum
|
CIN
|
/COUT
|
COUT
|
/CIN
|
CIN
|
/CIN
|
A. Read Operation [13,18]
Fig. 3(a) demonstrates the read behavior when L2 and L3 are deactivated, as indicated in Table 1. During this read operation, the selected data cell (RL or RH) is compared to the reference cell (RREF), and read by the PCSA. RREF has a resistance value between RL and RH, as depicted in Fig. 4(a). The outcome of the read operation, as read by the PCSA, is shown in Fig. 5(a).
Fig. 3. (a) Circuit for read operation; (b) Circuit for AND, OR operation[19,20].
Fig. 4. (a) Resistance distribution of RL, RH, and RREF[21,22]; (b) Resistance distribution when RL, RH, and RREFare connected in parallel [11].
Fig. 5. (a) Results of read operation according to MTJ state; (b) Result of AND operation according to MTJ 'A' and 'B' states; (c) Results of OR operation based on MTJ 'A' and 'B' states [23].
B. AND and OR Operations [1,24]
A key approach for performing bit logic operations in STT-MRAM macro involves organizing
and distinguishing resistor combinations. In Fig. 2, by enabling two WLs simultaneously, the resistive state can be extended by connecting
two resistors in parallel, as demonstrated in Fig. 3(b). Fig. 4(b) illustrates the resistance distribution of RL${\parallel}$RL, RH${\parallel}$RL, and RH${\parallel}$RH when two MTJs are connected in parallel, along with a reference resistor that distinguishes
the three resistance values. Then, these resistance combinations are connected to
the PCSA, and the resulting OUT indicates an AND operation when only L2 is activated
on the reference branch. Conversely, when only L3 is activated, the OUT represents
an OR operation.
III. MULTI-BIT FA
1. State-of-the-art Multi-bit FA [11]
Several papers have proposed the use of PCSA for sum operations [11-13]. The sum operation, as proposed by Wang et al. [11], can be executed by utilizing the PCSA equipped with the logic tree illustrated in
Fig. 2.
A. Carry Operation
Fig. 6(a) shows the single-bit carry operation. The carry result, denoted as COUT, is determined by the MAJ(A, B, CIN) function, where MAJ(A, B, 0) represents the AND operation (i.e., AND(A, B)) and
MAJ(A, B, 1) represents the OR operation (i.e., OR(A, B)). In the figure, the red
and blue paths correspond to the AND and OR operations, respectively.
Fig. 6. Single-bit FA using PCSA: (a) Carry operation (red path when CIN= 0 and blue path when CIN= 1); (b) Sum operation when CIN= 0 (red path when COUT= 0 and blue path when COUT= 1); (c) Sum operation when CIN= 1 (red path when COUT= 0 and blue path when COUT= 1).
B. Sum Operation
The sum result is determined by the MAJ(A, B, CIN, /COUT, /COUT), as shown in Table 1. L0, /L1, L1, and /L0 correspond to CIN, /COUT, COUT, and /CIN, respectively. In Fig. 6(b), the red path represents the case where MAJ(A, B, 0, 1, 1) becomes OR(A, B) and the
blue path represents the case where MAJ(A, B, 0, 0, 0) evaluates to zero. Fig. 6(c) shows the case where the red path of MAJ(A, B, 1, 1, 1) yields 1 and the blue path
of MAJ(A, B, 1, 0, 0) yields AND(A, B). This sum result can be achieved using the
logic tree or by reusing the AND and OR operations. Because the sum operation requires
the COUT value, it is essential to obtain it in the previous step so that the sum result can
be obtained in the next step of the calculation.
Fig. 7(a) shows the schematic of the state-of-the-art multi-bit FA [11]. Fig. 7(b) illustrates the SAE signal for the PCSA. In Fig. 7(c), it is evident that the sum operation for the current bit and the carry operation
for the subsequent bit are executed concurrently. The final outcome of the sum operation,
Sn, is obtained in stages ``n + 1''.
Fig. 7. (a) Schematic of multi-bit FA [11]; (b) SAE signal for the PCSA; (c) Result of multi-bit FA according to the number of stages.
2. Proposed Multi-bit FA using CSS Circuit
Fig. 8(a) shows the array structure of the proposed multibit FA. This structure can be used
to read inputs A and B simultaneously by closing a switch, or to read inputs A and
B separately by opening a switch. Fig. 8(b) shows the schematic of the CSS circuit, which is responsible for storing charge in
the capacitor and sharing the charge by closing the switch.
Fig. 8. (a) Array structure for the proposed multi-bit FA; (b) Schematic of the CSS circuit.
Fig. 9. (a) 1 stage operation; (b) 1.5 stage operation; (c) 2 stage operation; (d) Result of SA as a function of stage; (e) SAE signal.
To obtain COUT(X+3) from A(x+3)A(x+2)A(x+1)A(x) + B(x+3)B(x+2)B(x+1)B(x) + CIN, the values VCAP1, V\-CAP2, VCAP3, VCAP4, VCAP5, VCAP6, VCAP7, VCAP8, VCAP9 are used as inputs to VCIN, VA(x), VB(x), VA(x+1), VB(x+1), VA(x+2), VB(x+2), VA(x+3), VB(x+3), respectively. The size of the capacitor of the CSS circuit is determined by the
weight of each digit.
Based on CAP1, CAP2, and CAP3, which store the least significant bit and C\-IN, the second bit has a size of 2x, the third bit has a size of 4x, and the fourth
bit has a size of 8x. Charge-sharing occurs when all the switches are closed so that
all the capacitors have the same voltage. The voltage at this point is VCSS.
Fig. 10. (a) FA operation in parallel by 4 bits; (b) 4-bit adder; (c) Result as per stage.
VREF represents the reference voltage used for reading the output, OUT, of the SA. The
value of COUT(X+3) can be read using the latch-type SA [25,26], as depicted in Fig. 8(b).
Fig. 9 illustrates the process of calculating COUT for every 4 bits. In Fig. 9(a), which represents the stage 1, A1-A4 and B1-B4 are read using the PCSA, and the read
values, along with CIN, are stored in capacitors of the CSS circuit. Fig. 9(b) corresponds to stage 1.5. At this stage, the switch in the CSS circuit is closed
to obtain Vcss, which represents the shared voltage across the capacitors. Fig. 9(c) depicts the behavior during stage 2. Utilizing the Vcss obtained in stage 1.5, COUT4 (= C4, the carry-out bit for the fourth bit) is obtained using the SA. At the same
time, A5-A8 and B5-B8 are read using the PCSA and stored in the CSS circuit along
with COUT4. Thus, by continuing this process, the final result shown in Fig. 9(d) can be obtained by iteratively calculating COUT for each group of 4 bits.
Once the COUT values for every 4 bits are obtained through the CSS circuit, the 4-bit adder depicted
in Fig. 10(a) and (b) performs the sum operation in parallel, processing 4 bits at a time. The
resulting sum values can be observed in Fig. 10(c). Notably, all the sum operations are accomplished within a total of only ``n/4 +
5'' stages.
IV. SIMULATION
The efficiency of the proposed MRAM-based IMC platform was evaluated by Cadence Spectre
simulations with industry-compatible 28-nm model parameters.
Fig. 11 shows the read yield as a function of MTJ variation when reading STT-MRAM with PCSA.
It can be seen that the read yield decreases sharply as the MTJ variation increases.
The proposed CSS circuit can be utilized with SAs other than PCSA; therefore, to increase
the read yield, an offset-canceling current-sampling SA [27], single-cap offset-cancelled SA [28], offset-canceling single-ended SA [29], or a sensing circuit (SC) can be used as a pre-amplifier for the STT-MRAM to increase
the read yield. Examples of SCs include source-degeneration SC [30], body-voltage SC [31], etc.
Fig. 11. Read yield based on MTJ variation.
The capacitance mismatch can affect the accuracy of the calculation results. In Table 2, starting with a capacitance mismatch of 9%, the results are inverted. It does not
affect the accuracy up to 8%, but when the capacitance mismatch is larger, it will
affect the accuracy.
Fig. 12 shows the performance as a function of the number of bits in the adder. It can be
seen that as n increases, the performance becomes higher compared to the state-of-the-art
multi-bit FA [11], especially when n = 64, the number of stages can be reduced by more than 3 times.
In Table 3, compared to the state-of-the-art multi-bit FA [11], the proposed multi-bit FA using CSS circuit increases the area by about 2 times
and the energy by 1.6 times. Therefore, it has an advantage over the state-of-the-art
multi-bit FA [11] starting from 16 bits, when the number of stages is about half.
Fig. 12. $\frac{state-of-the-art multi-bit FA[11]stagecount}{proposed multi-bit FA using CSS circuit stage count}$ depending on the number of bits.
The 16-bit values of A (A16-A1), B (B16-B1), and CIN are set to ``1011 0111 1010 1100'', ``0100 0011 0111 1001'', and ``1'', respectively.
Fig. 13 shows the results of the state-of-the-art multi-bit FA [11], while the results of the proposed multi-bit FA using the CSS circuit are shown in
Fig. 14. Both sets of results have been calculated correctly. State-of-the-art multi-bit
FA [11] required 17 stages to perform the operation, whereas the proposed multi-bit FA using
the CSS circuit accomplished the operation in only 9 stages. In conclusion, by incorporating
the CSS circuit into the existing multi-bit FA, the number of required stages can
be reduced by half, from 17 to 9 stages, when 16-bit design is considered.
Fig. 13. 16-bit results from state-of-the-art multi bit FA [11]. “1011 0111 1010 1100” (A16-A1) + “0100 0011 0111 1001” (B16-B1) + “1” (CIN) = “0 1111 1011 0010 0110” (C16 S16-S1).
Fig. 14. 16-bit results of the proposed multi-bit FA using CSS circuit. “1011 0111 1010 1100” (A16-A1) + “0100 0011 0111 1001” (B16-B1) + “1” (CIN) = “0 1111 1011 0010 0110” (C16 S16-S1).
Table 3 compares the performance, energy consumption, and area utilization of the three multibit
FAs on a 16-bit basis. The evaluation parameters include the number of stages, number
and size of PCSAs with logic trees, number and size of additional transistors, number
of memory read operations, and energy consumption. The state-of-the-art multi-bit
FA [11] demonstrates superior area efficiency and low energy consumption; however, it suffers
from a high number of stages (poor performance). Although the utilization of KSA significantly
reduces the number of stages, its large area overhead prevents it from being incorporated
into the memory array. Similarly, other digital circuits such as carry lookahead adders,
carry select adders, and carry skip adders face similar area overhead challenges,
thus preventing their inclusion in the memory array. To address this issue, it is
necessary to optimize the overhead while improving the performance by leveraging the
analog domain instead of the digital domain [34]. Compared to the state-of-the-art multi-bit FA, the proposed multi-bit FA with the
CSS circuit requires approximately half the number of stages. Additionally, it employs
fewer transistors compared to the multi-bit FA with KSA. However, compared to the
other two multi-bit FAs, the proposed circuit entails a higher number of memory read
operations. In this case, the energy consumption by CAP is 22.56 f J, which accounts
for 2.3% of the total energy consumption. The reason for the increase in energy consumption
is the increase in the number of read operations. In summary, the proposed multi-bit
FA utilizing the analog domain offers intermediate performance between the other two
FAs while effectively addressing the area overhead problem associated with the digital
domain. Nevertheless, there is still a need to reduce energy consumption.
Table 2. CSS circuit operation result due to capacitance mismatch1)
Capacitance mismatch
|
0%
|
1%
|
2%
|
3%
|
4%
|
5%
|
Result
|
Pass
|
Pass
|
Pass
|
Pass
|
Pass
|
Pass
|
|
Capacitance mismatch
|
6%
|
7%
|
8%
|
9%
|
10%
|
11%
|
Result
|
Pass
|
Pass
|
Pass
|
Fail
|
Fail
|
Fail
|
1) For the worst case, “1111” (A4-A1) + “0000” (B4-B1) + “1” (CIN), we simulated the
CAP mismatch so that the CAP size where 1 are stored decreases and the CAP size where
0 are stored increases.
Table 3. Comparison of 16-bit sum operation between state-of-the-art multi-bit FA, multi-bit FA using KSA, and proposed multi-bit FA using CSS circuit
|
State-of-the-art multi-bit FA [11]
|
Multi-bit FA using KSA [32,33]
|
Proposed multi-bit FA using CSS circuit
|
Computing domain
|
Digital
|
Digital
|
Analog + Digital
|
Number of computing stages
(performance)
|
17
|
1 + tpg + 4*tAO + txor
|
9
|
PCSA count (size1))
|
16 (2.92 um2)
|
32 (5.84 um2)
|
32 (5.84 um2)
|
Additional transistor count
|
0
|
2982
|
104.5
|
Additional size1)
|
0 um2
|
7.16 um2
|
0.25 um2
|
Total size1)
(area overhead)
|
2.92 um2
|
13 um2
|
6.09 um2
|
Memory read operation count
|
32
|
32
|
56
|
Energy consumption
|
598.7 fJ
|
755.2 fJ
|
969.3 fJ
|
1) The size is the size for the pre-layout and is the sum of the width*length of the
transistor.
V. CONCLUSIONS
In this paper, we propose a multi-bit FA designed specifically for high-performance
sum operations in STT-MRAM-based IMC systems. The proposed multi-bit FA is implemented
with the CSS circuit in the analog domain with parallel Cout generation every 4 bits followed by a 4-bit sum operation in the digital domain.
Our circuit architecture demonstrates a more efficient stage utilization, requiring
only ``n/4 + 5'' stages per n-bit compared to the conventional ``n + 1'' stages. Moreover,
it significantly reduces the area overhead when compared to digital domain-based multi-bit
FAs, making it feasible for integration within a memory array. However, it is important
to note that the proposed circuit, while effectively reducing the number of stages,
requires twice the number of PCSA and additional circuits compared to the state-of-the-art
multi-bit FA. Additionally, its energy consumption is also higher. As a result, our
future work will be focused on minimizing both the area overhead and energy consumption
associated with the proposed circuit.
ACKNOWLEDGMENTS
This work was supported by Incheon National University Research Grant in 2022.
The EDA tool was supported by the IC Design Education Center (IDEC), Korea.
References
C. Wang et al., "Computing-in-memory paradigm based on STT-MRAM with synergetic read/write-like
modes," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May. 2021, pp. 1-5.
S. Jain et al., "Computing in memory with spin-transfer torque magnetic RAM," IEEE
Trans, Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 3, pp. 470-483, Mar. 2018.
T. Na, “Ternary output binary neural network with zero-skipping for MRAM-based digital
in-memory computing,” IEEE Trans. Circuits Syst. II, Exp. Briefs (TCAS-II), 2023.
Z. He et al., "Exploring STT-MRAM based in-memory computing paradigm with application
of image edge extraction," In 2017 IEEE International Conference on Computer Design
(ICCD)., Nov. 2017, pp. 439-446.
H. S. Stone, "A logic-in-memory computer," IEEE Trans. Comput., Vol. C-19, no. 1,
pp. 73-78, Jan. 1970.
T. Na et al., “STT-MRAM sensing: a review,” IEEE Trans. Circuits Syst. II, Exp. Briefs,
vol. 68, no. 1, pp. 12-18, Jan. 2021.
M. Zabihi et al. "In-memory processing on the spintronic CRAM: From hardware design
to application mapping," IEEE Trans. Comput., Vol. 68, no. 8, pp. 1159-1173, Aug 2019.
D. Apalkov et al. "Spin-transfer torque magnetic random access memory (STT-MRAM),"
ACM Journal. Emerging Technologies in Computing Systems (JETC), Vol. 9, no. 2, pp.
1-35, May 2013.
R. Bishnoi et al. "Improving write performance for STT-MRAM," IEEE Trans. Magn., vol.
52, no. 8, pp. 1-11, Aug 2016.
L. Zhang et al. "Addressing the thermal issues of STT-MRAM from compact modeling to
design techniques," IEEE Trans. Nanotechnology., Vol. 17, no. 2, pp. 345-352, Mar
2018.
C. Wang et al. "Design of an area-efficient computing in memory platform based on
STT-MRAM," IEEE Trans. Magn., vol. 57, no. 2, pp. 1-4, Feb. 2021.
G. Patrigeon et al. "Design and evaluation of a 28-nm FD-SOI STT-MRAM for ultra-low
power microcontrollers," IEEE Trans. Magn., vol. 7, no. 9, pp. 4982-4987, Sep. 2019.
S. Angizi et al "Design and evaluation of a spintronic in-memory processing platform
for nonvolatile data encryption," IEEE Trans. Comput.-Aided Design Integr. Circuits
Syst., vol. 37, no. 9, pp. 1788-1801, Sep. 2018.
H. Yu et al. "An adder using charge sharing and its application in DRAMs," In Proceedings
2000 International Conference on Computer Design, Sep. 2000.
V. Vijay et al. "A Review On N-Bit Ripple-Carry Adder Carry-Select Adder And Carry-Skip
Adder," Journal of VLSI circuits and systems., vol. 4, no. 01, pp. 27-32, Mar. 2022.
J.-G. Zhu et al. "Magnetic tunnel junctions," Mater. today., vol. 9, no. 11, pp. 36-45,
Nov. 2006.
M. Hosomi et al. "A novel nonvolatile memory with spin torque transfer magnetization
switching: Spin-RAM," in IEDM Tech. Dig., Dec. 2005, pp. 459-462.
Y. Luo et al. "A variation robust inference engine based on STT-MRAM with parallel
read-out," Proc. IEEE Int. Symp. Circuits Syst. (ISCAS) Oct. 2020.
S. Ikeda et al. "Magnetic tunnel junctions for spintronic memories and beyond," IEEE
Trans. Electron Devices., vol. 54, no. 5, pp. 991-1002, May. 2007.
M. Zabihi et al. "Using spin-hall mtjs to build an energy-efficient in-memory computation
platform," Proc. 20th Int. Symp. Qual. Electron. Design (ISQED), Mar. 2019, pp. 52-57.
E. Deng et al. "Low power magnetic full-adder based on spin transfer torque MRAM,"
IEEE trans. Magn., vol. 49, no. 9, pp. 4982-4987, Sep. 2013.
S. Lim et al "Highly independent MTJ-based PUF system using diode-connected transistor
and two-step postprocessing for improved response stability," IEEE Trans. Inf. Forensics
Security., vol. 15, pp. 2798-2807, 2020.
W. Zhao et al "Design considerations and strategies for high-reliable STT-MRAM," Microelectron.
Rel., vol. 51, no. 9, pp. 1454-1458, Sep. 2011.
G. P. Devaraj et al "Design and Analysis of Modified Pre-Charge Sensing Circuit for
STT-MRAM," 2021 Third International Conference on Intelligent Communication Technologies
and Virtual Mobile Networks (ICICV), March. 2021, pp. 507-511.
T. Na et al "Comparative study of various latch-type sense amplifiers," IEEE Trans.
Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 2, pp. 425-429, Feb. 2014.
B. Wicht et al. "Yield and speed optimization of a latch-type voltage sense amplifier,"
IEEE Journal of Solid-State Circuit. (JSSC), vol. 39, no. 7, pp. 1148-1158, July.
2004.
T. Na et al., "Offset-canceling current-sampling sense amplifier for resistive nonvolatile
memory in 65 nm CMOS", IEEE J. Solid-State Circuits, vol. 52, no. 2, pp. 496-504,
Feb. 2017.
Q. Dong et al., "A 1-Mb 28-nm 1T1MTJ STT-MRAM with single-cap offset-cancelled sense
amplifier and in situ self-write-termination", IEEE J. Solid-State Circuits, vol.
54, no. 1, pp. 231-239, Jan. 2019.
T. Na et al., "Offset-canceling single-ended sensing scheme with one-bit-line precharge
architecture for resistive nonvolatile memory in 65-nm CMOS", IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 27, no. 11, pp. 2548-2555, Nov. 2019.
J. Kim et al., "A novel sensing circuit for deep submicron spin transfer torque MRAM
(STT-MRAM)", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 1, pp.
181-186, Jan. 2012.
F. Ren et al., "A body-voltage-sensing-based short pulse reading circuit for spin-torque
transfer RAMs (STT-RAMs)", Proc. Int. Symp. Quality Electron Design (ISQED), pp. 275-282,
2012.
P. Chakali et al "Design of High Speed Kogge-Stone Based Carry Select Adder," International
Journal of Emerging Science and Engineering. (IJESE), vol. 1, no. 4, pp. 2319-6378,
Feb. 2013.
R. Anjana et al "Implementation of Vedic mutiplier using Kogge Stone adder," IEEE
Int. Conf. on Embedded Sys., July. 2014, pp. 28-31.
T. Brächer and P. Pirro "An analog magnon adder for all-magnonic neurons," J. Appl.
Phys., vol. 124, no. 15, Oct. 2018.
Jangseok Yu received the B.S. degree in Electronics Engineering from Incheon National
University, Incheon, Republic of Korea, in 2024.
Geonwoo Lee is currently pursuing the B.S. degree in Electronics Engineering from
Incheon National University, Incheon, Republic of Korea.
Taehui Na received the B.S. and Ph.D. degrees in Electrical & Electronic Engineering
from Yonsei University, Seoul, Republic of Korea, in 2012 and 2017, respectively.
From 2017 to 2019, he was with Samsung Electronics Co., Ltd., Hwasung, Republic of
Korea, where he worked on phase-change random access memory (PRAM) and high-performance
NAND (ZNAND) core circuit designs. Since 2019, he has been a professor at Incheon
National University, Incheon, Republic of Korea. His current research interests are
focused on process-voltage-temperature variation tolerant and low-power circuit designs
for memory, microcontroller unit, and neuromorphic SoC.