ChoSeongjae1
LeeSung-Tae2
KimSoomin1
ShinHyungcheol3,*
-
(Department of Electronic and Electrical Engineering, Ewha Womans University, Seoul
03760, Korea)
-
(School of Electronic and Electrical Engineering, Hongik University, Seoul 04066, Korea)
-
(Department of Electrical and Computer Engineering, Seoul National University, Seoul
08826, Korea)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Hardware artificial intelligence, power efficiency, multiplicate-and-accumulate (MAC) operation, memory-based artificial intelligence chip
I. INTRODUCTION
Artificial intelligence has been mainly developed in a way that implements mathematical
representation of the functions of neurons and synapses in biological systems through
software technology [1-3]. Artificial intelligence has begun to be realized based on hardware for the energy
efficiency and volume reduction of the system since a few years ago, and artificial
intelligence semiconductor chips based on integrated circuits have become visible
[4-6]. C. Mead mentioned ``neuromorphic'' system as a next-generation computing technology
that can maximally parallelize serial operations in the conventional digital computers
[7]. However, although current neuromorphic chips mimic effectively the functions of
neurons and synapses, they are still done on digital integrated circuit technology.
In order to implement a neuromorphic system in a more active sense and in a more faithful
sense to the original text, renovation at the component technology level must be made.
Component technology here means memory device technology, so various memory devices
can be the basis of hardware-oriented artificial intelligence computing [8-11]. In an equal sense, the ultimate form of the artificial intelligence computing is
memory computing. The biggest advantage of hardware-oriented artificial intelligence
semiconductor chips is power efficiency, and how successful the technology is should
be based on quantitative evaluation of power efficiency. Tera operations per second
per watt (TOPS/W) is a widely-used metric index for evaluating the operational power
efficiency of digital circuit-based multiplicate-and-accumulate (MAC) operations [12], but it is difficult to apply to neuromorphic systems that perform event-driven operations
not relying on clock [13]. In addition, if operations are performed in a memory-based synaptic array rather
than circuit-based one, it is more difficult to use the existing indicator. This paper
presents a method of deriving the power efficiency of artificial intelligence chips
that are operated based on synaptic memory cells. The unit does not lose generality
by using TOPS/W, but a technique that can reflect the characteristics of memory devices
is presented via a purely mathematical process.
II. NUMBER OF MAC OPERATIONS
In order to draw a general conclusion, an inductive method can be chosen by explaining
it through some representative scenarios. As shown in Fig. 1, a fully-connected network (FCN) that takes a Modified National Institute of Standards
and Technology (MNIST) image pattern without intentional reduction in number of pixels
as input and classifies from 0 to 9 can be presumed. That is, the number of input
nodes is 28 ${\times}$ 28 = 784, and the number of output nodes is 10. Also, it is
assumed that the network has a single hidden layer and there are 200 hidden nodes
on it. The total number of synapses can be revealed by the following simple equation,
Eq. (1).
The synaptic weight, or artificial intelligence parameter in the equivalent term,
is determined by the strength in connection between neurons on two different consecutive
layers making up the FCN. The connectivity between ith neuron on a pre-layer and jth
neuron on the post-layer can be noted as w$_{ij}$. Fig. 2 shows a network constructed by two layers, in which the pre-layer has 5 nodes and
the post-layer has 4 nodes, respectively. Based on the definition of weight, it can
be identified that the positive integer i and j are ranged 1 ${\leq}$ i ${\leq}$ 5
and 1 ${\leq}$ j ${\leq}$ 4. Through this method of weight representation, all the
weights existing in the network in Fig. 2 can be expressed in a single matrix as shown in Fig. 3.
As previously defined, weight identification can be defined with a two-digit subscript
with the order of a node on the pre-layer as the first digit and that of a node on
the post-layer as the second digit, which allows a matrix with the numbers of pre-layer
and post-layer nodes as the numbers of rows and columns of the matrix, respectively
as can be clarified by Fig. 3. The first row (blue shaded) is a set of weights associated with node 1 on the pre-layer;
the fourth column (orange shaded) is that of weights related with node 4 on the post-layer.
Looking at the matrix in Fig. 3, it can be seen that the product between the numbers of nodes constituting each layer
in two different consecutive layers is the number of entries of the matrix. It indicates
the number of weight values that determine the connectivity between the neurons on
the pre-synaptic and post-synaptic layers. Further, the FCN in Fig. 1 can be also represented by the product between subnetwork matrices including the
input and the output vectors as shown in Fig. 4. The number of weight values corresponds to that of multiplication operations between
the two layers. In performing sum operations among the terms over which product has
been operated, the number of sum operations is always 1 less than that of terms to
be added, that of weighted inputs. Looking back at the examples in Fig. 2 and Fig. 3, five weighted inputs are fed into node 4 on the post-layer, and thus, the number
of sum operations made on them is four. Since these operations are carried out on
the four nodes constituting the post-layer, a total of 16 sum operations are executed.
These results can be generalized for an FCN with the number of pre-layer nodes m and
post-layer nodes n as Eqs. (2) and (3) below.
Based on this mathematical foundation, the total number of MAC operations performed
in the FCN in Fig. 1 can be obtained. Fig. 2 shows the redrawn FCN with identifications of subnetworks A and B that can be represented
by two different matrices. From Eq. (2), the total number of product operations in the subnetwork A is 784 ${\times}$ 200
= 156,800. From Eq. (3), the total number of sum operations in the subnetwork A is (784 - 1) ${\times}$ 200
= 156,600. Thus, the total number of MAC operations is calculated to be 313,400 operations
in the subnetwork A. In the same manner, the total number of MAC operations conducted
in the subnetwork B is 200 ${\times}$ 10 + (199 - 1) ${\times}$ 10 = 3,990. Finally,
the total number of MAC operations carried out in the exemplary FCN in Fig. 1 and 2 comes to $313,400 + 3,990 = 317,390$ operations. Here, it is assumed that
all the MAC operations for inferencing are performed at the same time and all the
synaptic devices are activated. Thus, this value provides the worst case assumption
in calculating the power efficiency of MAC operation in the given FCN. This calculation
method is also applicable to deep neural networks (DNNs) and the total number of MAC
operations in a given network is obtained by individually adding the number of MAC
operations between two successive layers.
If the neural network is composed of n subnetworks, or equivalently, (n-1) hidden
layers, the total number of MAC operations can be calculated in the same manner. Here,
it can be assumed that the number of input nodes is m$_{1}$ and that of output nodes
m$_{\mathrm{n+1}}$ so that the total number of conversion matrices is n. The sum of
numbers of multiplications and accumulations (MAC operations) of the kth subnetwork
can be calculated as Eq. (5).
Thus, the total number of MAC operations over the n subnetworks and that of synapses
in the entire artificial neural network are expressed as Eqs. (6) and (7), respectively. Plugging Eqs. (6) and (7) into Eq. (4) results in the MAC operation efficiency of the deep neural network (DNN) having n
subnetworks or (n-1) hidden layers. Since the other three terms in the denominator
are not affected by the array size once the type of synaptic device is already given,
the MAC operation efficiency is determined by the ratio between the terms in Eqs.
(6) and (7) which are put in the numerator and in the denominator in Eq. (4), respectively.
Eq. (8) implies that MAC operation efficiency becomes less dependent on the number of hidden
layers as the depth of an artificial neural network is deepened. Here, the numbers
of rows and columns in the individual matrices are assumed to be comparably same with
a. Thus, furthermore, as the size of individual conversion matrices increases, the
ratio in Eq. (8) asymptotically approaches the factor of 2. With the numbers given in a previous example,
317,390 / 158,800 = 1.999, which is a value very close to 2. Therefore, it is revealed
that the MAC operation efficiency in Eq. (4) is validated even for a DNN, having little dependence on the depth of the neural
network and the number of synaptic devices. This mathematical formulation is valid
only when the size of task is not considered. If the array is too small compared with
that of a given task (number of operations that need to be performed at one time for
achieving a specific goal), the operation efficiency of the small synapse array would
be low. If the array size is excessively large compared with the workload, most of
the power consumption would be dedicated to sustaining the stand-by (or low-conductivity)
mode of the synaptic cells not in work. How efficiently a given task can be completed
by a small amount of energy is determined by the sizes of synaptic array and workload
so that this matter might go to an optimization problem in reality.
Fig. 1. Fully-connected network (FCN) having one hidden layer with 200 nodes performing MNIST pattern classification.
Fig. 2. Two-layer network where the pre-layer has 5 nodes and the post-layer has 4 nodes, respectively.
Fig. 3. Matrix representation of all the synaptic weights in the two-layer network given in Fig. 2.
Fig. 4. Matrix representation of the FCN in Fig. 1 demonstrating the relation between input and output vectors.
Fig. 4. Redrawn FCN in Fig. 1 with identifications of subnetworks A and B that can be represented by two different matrices.
III. POWER EFFICIENCY OF INFERENCE AND IMPLICATION FOR SYNAPSE CELL DESIGN
The power efficiency of the MAC operation for inference has been defined as Eq. (4) in this study. As previously stated, this definition is quite different from that
usually adopted for general digital circuit-based MAC operation accelerators. However,
for familiarity on the evaluator’s side and metric generality, the unit of TOPS/W
can be maintained. The most distinctive feature of the MAC operation efficiency presented
in Eq. (4) is that the power efficiency depends primarily on the characteristics of the synaptic
device. To understand what level of value can be provided by Eq. (4), realistic values for the terms constituting the denominator of Eq. (4) should be substituted, and for this, several assumptions can be made as follows.
(i) Binary memory operation (0 and 1 weight)
(ii) The numbers of synapses having state 0 and state 1 at arbitrary moment are equal.
(iii) Only the inference operation is taken into account for calculating the power
efficiency.
(iv) Inference current at state 1 = 1 ${\mu}$A = 10$^{-6}$ A
(v) Inference current at state 0 is negligibly small; smaller than 1/1,000 times of
(iv).
(vi) Inference voltage = 1 V
(vii) Inference time = 1 ${\mu}$s = 10$^{-6}$s
The first term in the denominator of Eq. (4), total number of synapses, was already prepared and can be brought from Eq. (1). Plugging Eq. (1) and all the presumably determined values in the assumptions above into Eq. (4) provides the inference power efficiency of the neural network of 784 ${\times}$ 200
${\times}$ 10 FCN based on synaptic memory devices.
Looking into Eqs. (4) and (9), some important implications for designing memory device-based synaptic cells can
be derived. First, MAC operation efficiency is inversely proportional to on-state
inference current. Thus, synaptic cells should be designed to have low maximum conductance.
Second, it is necessary to develop synaptic devices that can lower the inference voltage.
Third, synaptic devices with high read operation speed (high inference speed) should
be designed. If another set of assumptions is made with inference current = 100 nA,
inference voltage = 0.5 V, and inference time = 100 nA, a drastically high efficiency
of 400 TOPS/W is obtained. Since three of the four terms making up the denominator
of Eq. (4) are determined by the electrical characteristics of the single synaptic memory device,
the MAC operation efficiency defined in Eq. (4) can be understood to be a highly practical index focused on the performance of the
cell itself. The on-state inference current and the inference voltage are values that
can decrease as the synaptic device is shrunken. On the other hand, although inference
time can be shortened as the device is scaled down, the inference operation of synaptic
cell is largely influenced by interconnect technology since it is not done on an individual
cell standing separated, but in the whole synapse array level. As a result, it can
be concluded that the ensemble effect on the MAC operation efficiency from the three
terms defined by the miniaturization of the synaptic cell is not so significant. According
to Eq. (4), total number of synapses and MAC operation efficiency are inversely proportional,
so the higher the degree of integration density of synapse array constructing the
artificial intelligence semiconductor chip, the lower the efficiency. Fortunately,
however, the total number of MAC operations over distinct layers in the numerator
of Eq. (4) has a succinct cancelling effect with the total number of synapse in the denominator.
It can be explicitly proven through the procedures in Eq. (10) that the MAC operation efficiency for inference presented in this study is very weak
in dependence on scaling level of a synapse or array integration density of synaptic
devices.
In particular, the inference time in the denominator in Eqs. (4) and (6) indicates the time required for bitline inference operation rather than for cell-level
read operation, which makes the inference time little dependent on cell scaling. Here,
NI and NO stand for numbers of inputs and outputs, respectively. h is the number of
nodes in the hidden layer. Eq. (10) demonstrates that the MAC operation efficiency is independent of cell scalability
and synapse array density, and is only dependent on the cell characteristics. The
derivation of Eq. (10) has been made in consideration that the synapses show a binary operation and all
the synapses have the highest conductivity or fully-on inference current for the worst-case
scenario. If the synaptic device is capable of multi-level operation, the MAC operation
efficiency in Eq. (10) goes higher. If the synaptic device is permitted to have n different inference current
levels and all the synaptic devices have the equal probabilities to have the permitted
n weights, 1/n of the total synaptic devices have 0, 1/(n-1), 2/(n-1), …, (n-2)/(n-1),
and (n-1)/(n-1) = 1 times of the fully-on inference current. Applying these assumptions
to Eq. (10), the head part of Eq. (10) can be simplified as Eq. (11). As the result, the MAC operation efficiency in Eqs. (10) and (11) is independent on the number of synaptic levels, by which the generality of the equations
can be finally validated. However, this generality can be limited inside the synapse
array and one should be on guard for increased complexity and power consumption of
the peripheral circuits inevitably required for the multi-level operation in designing
the entire system architecture.
IV. CONCLUSIONS
In this study, an indicator for calculating the MAC operation efficiency of a full-fledged
hardware-oriented artificial intelligence semiconductor chip in which the operations
are performed in an artificial neural network composed of synaptic cells based on
memory devices has been presented. Although different from the definition of existing
ones, the index definition has been made so that device-specific parameters can have
the predominance without losing familiarity and generality by maintaining the unit
of TOPS/W. The value of the indicator obtained by the newly proposed method is likely
to improve depending on scaling of device, but it is hard to address that there is
a great dependence, and it is hardly affected by the integration size of the synaptic
array. The new performance metric index will serve as a highly practical guideline
for designing synaptic devices that make up hardware-oriented artificial intelligence
chips and predicting the inference power efficiency of the synapse array in separation
from the peripheral circuits.
ACKNOWLEDGMENTS
This work was supported by the Ministry of Science and ICT of Korea (MSIT) through
the Grants 2020-0-01294 and RS-2023-00258527.
References
F. Rosenblatt, “Perceptron Simulation Experiments,” Proc. IRE, vol. 48, no. 3, pp.
301-309, Mar. 1960.
H. D. Block, “The Perceptron: A Model for Brain Functioning. I,” Rev. Mod. Phys.,
vol. 34, no. 1, pp. 123-135, Jan. 1962.
S. K. Pal and S. Mitra, "Multilayer Perceptron, Fuzzy Sets, and Classification,” IEEE
Trans. Neural Networks, vol. 3, no. 5, pp. 683-697, Sep. 1992.
M. Davis, et al., “Loihi: A Neuromorphic Manycore Processor with On-Chip Learning,”
IEEE Micro, vol. 38, no. 1, pp. 82-99, Jan. 2018.
F. Akopyan, et al., “TrueNorth: Design and Tool Flow of a65 mW 1Million Neuron Programmable
Neurosynaptic Chip,” IEEE Trans. Compt. Aided Des. Integr. Circuits Syst., vol. 34,
no. 10, pp. 1537-1557, Oct. 2015.
N. P. Jouppi, et al., “In-Datacenter Performance Analysis of a Tensor Processign Unit,”
Proc. Annual International Symposium on Computer Architecture (ISCA), pp. 1-12, Toronto,
Canada, Jun. 2017.
C. Mead, “Neuromorphic Electronic Systems,” Proc. IEEE, vol. 78, no. 10, pp. 1629-1636,
Oct. 1990.
S. Cho, “Volatile and Nonvolatile Memory Devices for Neuromorphic and Processing-in-memory
Applications,” J. Semicond. Technol. Sci., vol. 22, no. 1, pp. 30-46, Feb. 2022.
D. J. Jang, H. Ryu, H. Cha, N.-Y. Lee, Y. Kim, and M.-W. Kwon, “Synaptic Device Based
on Resistive Switching Memory using Single-Walled Carbon Nanotubes,” J. Semicond.
Technol. Sci., vol. 22, no. 5, pp. 346-352, Apr. 2022.
K. Udaya-Mohanan, S. Cho, and B.-G. Park, “Medium-Temperature-Oxidized GeO Resistive-Switching
Random-Access Memory and Its Applicability in Processing-in-Memory,” Nanoscale Res.
Lett., vol. 17, pp. 63-1-63-14, Jul. 2022.
B. Jeon, T. Jang, S. Cho, H. Shin, and W. Y. Choi, “Synapse Array with Buried Bottom
Gate Structure for Neuromorphic Systems,” Proc. Silicon Nanoelectronics Workshop (SNW),
pp. 15-16, Kyoto, Japan, Jun. 2023.
Q. Liu, et al., “A Fully Integrated Analog ReRAM Based on 78.4 TOPS/W Compute-In-Memory
Chip with Fully Parallel MAC Computing,” Proc. IEEE International Solid-State Circuits
Conference (ISSCC), pp. 500-502, San Francisco, CA, Feb. 2020.
V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “How to Evaluate Deep Neural Network
Processors: TOPS/W (alone) Considered Harmful,” IEEE Solid-State Circuits Mag., vol.
12, no. 3, pp. 28-41, Aug. 2020.
Seongjae Cho received the B.S. and the Ph.D. degrees in electrical engineering
from Seoul National University, Seoul, Korea, in 2004 and 2010, respectively. He worked
as an Exchange Researcher at the National Institute of Advanced Industrial Science
and Technology (AIST), Tsukuba, Japan, in 2009. He worked as a Postdoctoral Researcher
at Seoul National University in 2010 and at Stanford University, Palo Alto, CA, from
2010 to 2013. Also, he worked as a faculty member at the Department of Electronic
Engineering, Gachon University, from 2013 to 2023. He is currently working as an Associate
Professor at the Department of Electronic and Electrical Engineering, Ewha Womans
University, Seoul, Korea, from 2023. His current research interests include emerging
memory devices, advanced nanoscale CMOS devices, and ultra-small integration technologies.
Sung-Tae Lee received the B.S. and the Ph.D. degrees in electrical and computer
engineering from Seoul National University (SNU), Seoul, Korea, in 2016 and 2021,
respectively. He has been an Assistant Professor with the School of Electronic and
Electrical Engineering, Hongik University, since 2023. His current research interests
include neuromorphic devices and their application in advanced computing.
Soomin Kim received the B.S. degree in Electronic and Electrical Engineering from
Ewha Womans University, Seoul, Korea, in 2023. She is currently pursuing the M.S.
degree at Ewha Womans University. Her current research interests include nanoscale
CMOS devices, low-power synaptic devices, and scalable neuron circuits for neuromorphic
applications.
Hyungcheol Shin received the B.S. and M.S. Degrees in electrical engineering from
Seoul National University, Seoul, Korea, in 1985 and 1987, respectively, and the Ph.D.
degree in electrical engineering from the University of California, Berkeley, in 1993.
From 1994 to 1996, he worked as a Senior Device Engineer in Motorola. From 1996 to
2003, he was with the Department of Electrical Engineering and Computer Science at
the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, as an Associate
Professor. From 2001 to 2002, he worked as a Staff Scientist in Qualcomm. Since 2003,
he has been with Seoul National University (SNU), Seoul, Korea, where he is currently
a professor in the Department of Electrical and Computer Engineering. From 2012 to
2013, he was a Director of the Inter-university Semiconductor Research Center (ISRC)
at SNU.