I. INTRODUCTION
As the number of mobile devices grows exponentially, user authentication that verifies
the identity of a person using the device becomes crucial. Among various user authentication
methods including traditional passcode-based methods and modern biometrics-based methods
such as fingerprint and iris recognition, face recognition (FR) is getting attention
in the mobile platform as it requires no direct physical contact. Due to the recent
advance in recognition algorithm using deep neural networks, its accuracy and robustness
level are also commercially viable (1). However, face recognition-based user authentication is computationally challenging
on mobile devices as it needs to be always active despite of its limited battery capacity
(2).
Fig. 1 illustrates three stages of face recognition pipelines (2,3) which consist of image acquisition that acquires the input image from image sensor,
face detection (FD) which detects possible interested regions that may include face
in the image, and face recognition that extracts features out of the detected regions
and matches them against database. The first two stages operate at all times to detect
a face in the input image while the third recognition stage is only activated when
the face is detected. Although the recognition stage involves heavy computations,
its average power consumption is practically less than 5% of total power consumption
due to its low occurrences (2,4). Therefore, it is imperative to realize low-power operation in the first two always-on
stages to reduce overall power consumption.
Fig. 1. Conventional face recognition pipeline
Previous face recognition processors (2,3,5) utilized a digital signal processor (DSP) to implement above face recognition pipelines.
They used an array of analog-to-digital converters (ADCs) to convert analog sensor
signals to digital domain immediately after the image acquisition and used DSP to
handle complex computations in the detection and recognition stage. Although this
approach is simple and easy to implement, it requires high-resolution ADCs to meet
the recognition algorithm’s accuracy requirement. For most popular deep convolutional
neural network (CNN) based algorithms such as AlexNet and GoogLeNet, the bit resolution
should be at least 10 to achieve a desirable recognition accuracy (6) and the prior works (2,3,13) use 16-bit for input bit-precision to guarantee the high face recognition accuracy.
As well-known from previous literature (2,3,5), high-resolution ADC is power-hungry and an array of high-resolution ADCs becomes
a major bottleneck in power consumption with accounting for more than half of the
total power consumption (2,7).
In this paper, we present an ultra-low-power always-on face recognition processor
for mobile devices. To this end, we propose an analog-digital mixed-mode architecture
that removes the power-hungry ADCs and conducts high-resolution in the analog domain.
In the analog CNN processor, we propose three key building blocks: reconfigurable
correlated double sampling (CDS) readout circuit for adaptive domain conversion, leakage
tolerant analog memory, and error-tolerant current-based weighted-sum unit.
II. ANALOG-DIGITAL MIXED-MODE ARCHITECTURE
1. Modified Face Recognition Pipeline in Mixed-mode
To solve the main power bottleneck due to high-
Fig. 2. Proposed analog-digital mixed-mode architecture
resolution ADCs, we propose an analog-digital mixed-mode architecture composed of
an image sensor, an analog CNN processor, and a DSP as shown in Fig. 2. The ADCs used to convert raw image signals in the analog domain to high-resolution
digital signals are now replaced by the analog CNN processor with a ternary quantizer.
Unlike the conventional architecture processes all CNN pipelines in the digital domain,
the proposed architecture pushes a part of CNN processing into the analog domain for
low power consumption. The analog CNN processor computes the input layer of the CNN
pipelines in the analog domain with very high resolution (> 32-bit floating point)
and then quantizes the results into 3 digital values, i.e., 0, 1, 2. To this end,
we modified a conventional face recognition pipeline that has two different CNNs for
face detection and face recognition, respectively, to having a shared input layer
and then diverge into separated CNN layers. To adopt the shared input layer, at first
each face detection and face recognition network is trained independently and the
input layer of the face detection network is replaced to the input layer of the face
recognition network. After that, while the input layer is fixed, the rest of the layers
are re-trained for fine-tuning. The detailed face detection and face recognition network
configuration is shown in Fig. 3.
Fig. 3. Face detection & recognition network configuration
face recognition network configuration is shown in Fig. 3. As the roles of the input layer processing are the same from both CNNs, the modified
recognition pipeline doesn’t affect algorithmic accuracy. More specifically, the baseline
FR accuracy using 32-bit floating point (FP) input and weight is 97.52%. First, to
perform analog to digital domain conversion and to replace the power consuming ADC,
we adopt ternary quantization in output of the first layer. In that case, simulated
FR accuracy is degraded to 96.9%. Then to reduce the total power consumption in analog-digital
mixed-mode CNN, the input bit-width and weight bit-width of overall convolutional
layers are reduced to 8-bit fixed point except the shared input layer which adopts
4-bit fixed point for the weight bit-precision. Overall, the modified face recognition
pipeline achieves 96.18% recognition accuracy in the LFW dataset (8) while the baseline face recognition algorithm achieves 97.52% recognition accuracy
with 32-bit floating-point weights. This 1.37% accuracy loss is promising considering
we used much lower resolutions in weight values and used ternary quantization after
analog computation. Additionally, compared to the latest binary weight face recognition
processor (5) which achieves 96% recognition accuracy, the accuracy of 96.18% is quite reasonable.
The unified face recognition pipeline in the proposed mixed-mode architecture is summarized
as follows. First, when the image sensor captures the image, the analog CNN processor
processes the shared layer of face detection and face recognition CNN. It then quantizes
the results to ternary value and the ternary value chooses 16-bit data in look-up-table
(LUT) that goes into the digital domain. These 16-bit data are the pre-transposed
digital value of half-VDD, one-third, and two-thirds of
Fig. 4. Power reduction of proposed architecture
maximum analog processing value which is pre-captured in maximum search unit in analog
CNN processor. This maximum value can cover the whole range of analog processing outputs.
Then, based on these outputs, the rest of face detection CNN layers detect the face
region of interest from the entire image, and if a face is detected, the rest of face
recognition CNN layers decide whether identification result is true or not by reusing
these ternary value.
The proposed analog-digital mixed-mode architecture can eliminate the power consumption
for domain conversion in conventional architecture. Fig. 4 illustrates the power reduction of proposed architecture compared to the previous
conventional architecture (7,9,10) with different kinds of ADC. The power of each image sensor is normalized to the
320×240 pixel array and 1-fps frame rate. In the proposed architecture, the image
sensor cell array and readout circuit consume 0.624 uW and 0.517 uW, respectively,
and analog CNN processor consumes 57.7 uW. This total power of 58.84 uW can achieve
at least 57.9% power reduction compared to the various type of ADC based image sensor
and this power gain is due to the absence of high-resolution ADC array which is the
power bottleneck of conventional architecture.
2. Overall Block Diagram
Fig. 5 illustrates the overall block diagram of the proposed analog-digital mixed-mode face
recognition that has 3 main components: a CMOS image sensor (CIS), an analog CNN processor,
and a digital processor. The CIS is composed of an array of 320×240 3T pixels and
a CDS readout circuits that pass image pixel data to
Fig. 5. Overall block diagram
the analog CNN processor. The analog CNN processor, which performs the shared layer
of face detection and face recognition CNN pipelines, has 5 building blocks: analog
memory, current-based weighted-sum unit, analog-ReLU unit, maximum value searcher
and ternary quantization unit with LUT. The image pixel data read from the CIS row-by
row are buffered to the analog memory that can hold up to 320×3 pixels. Thereafter,
3×3 values in analog memory corresponding to the 3×3 kernel are selected, and the
data are broadcasted to the 64 current-based weighted-sum units corresponding to a
3×3×64 weight kernel. When the required convolution operation with the 3×3 kernel
is completed for the 320×3 input image, a new row of CIS is loaded and repeat the
above convolution operations until the last row of CIS. Finally, the output of the
current-based weighted-sum unit is converted to the digital value through the analog
ReLU and ternary quantization. At this time, the maximum value of the analog convolution
operation is pre-captured by the maximum value search unit which will be used as the
digital value in LUT. The digital processor includes a DNN processor that accelerates
the rest of the face detection and face recognition CNN pipelines with a cluster of
convolution cores and an aggregator. It also has a controller for controlling both
an analog processor and a DNN processor.
III. DETAILED ANALOG CIRCUITS
1. Reconfigurable CDS Readout Circuit for Adaptive Domain Conversion
The CIS pixel value is read through the CDS readout circuit and is passed to the
analog CNN processor.
Fig. 6. Reconfigurable CDS readout circuit
Although the CMOS image sensor requires a high supply voltage of 2.5 V to cover a
large dynamic range of illuminance, the analog CNN processor does not need a high
supply voltage that causes high power consumption. Therefore, we lowered the supply
voltage of the analog CNN processor to 1.2 V and reduced power consumption by 52%
compared to 2.5 V supply voltage in overall analog computation. However, if the supply
voltage of the analog processing is lowered, it is necessary to change the sensor
output to fit into the corresponding lower supply voltage range. Although the output
value of CIS is variable depending on the amount of illuminance, the readout circuit
should be able to convert the output of CIS into the desired voltage range regardless
of the amount of illuminance. Therefore, as shown in Fig. 6, a reconfigurable CDS readout circuit is proposed. The reconfigurable CDS readout
circuit consists of a sample & hold with switched capacitor circuit for readout the
pixel value and the adaptive domain conversion is implemented through the reconfigurable
voltage reference circuit. Because of the switched capacitor operation, when the amount
of illuminance is large, voltage reference should be increased to convert pixel value
into the desired output voltage range (0.23 V ~ 0.78 V). Therefore, by using four
reconfigurable voltage reference value (0.748 V, 0.922 V, 1.292 V, and 1.84 V), it
can cover the full range of 60-900 illuminance when simulation conducts in 61,500
sensitivity (e-/lx*s) pixels with 2 ms exposure time. These four reconfigurable voltage
reference values are implemented by a high output multi-voltage reference circuit
which is presented in (11). By using this, from the low reference value to the high reference value, a wide
range of voltage references can be generated and the desired reference voltage value
can be selected according to the amount of illuminance in
Fig. 7. Leakage tolerant analog memory w/ exposure time division
the digital controller. Finally, by using the proposed reconfigurable CDS readout
circuit, we can realize the desired analog processing in various amounts of illuminance.
2. Leakage Tolerant Analog Memory w/Exposure Time Division
Analog memory is structured as 320×3 source follower based memory cells and it holds
the 3 rows of pixel array for calculating convolution operation with 3×3 weight kernel.
The input data stored in a 3x3 window in the analog memory are broadcasted into the
64 current-based weighted-sum units and each of the current-based weighted-sum units
has 3×3 weight values. In this case, each row of memory is connected to the corresponding
row of weight kernel by the SEL signal of the switching network. Then the convolution
operation can be conducted by controlling REN signal in memory cell the same as the
3×3 weight kernel sweep. Therefore, by controlling the REN and SEL signal, convolution
operations with the desired 3×3 weight kernel are possible.
In Fig. 7(a), the timing diagram of analog processing is represented. For single 3×3 convolution
computing, analog memory should hold the value during the 0.9 us. Therefore, for computing
whole data to the 3×3 weight kernel, analog memory should maintain a row data for
1.14 ms and the loss of image data should be kept to a minimum during this time. As
shown in Fig. 7(b), the 55.6 fF of the storage capacitor has only 1.9% and 2.3% variation in minimum
and maximum input value which is quite negligible in analog processing.
Additionally, considering the analog processing time in the above timing diagram,
there is a problem that the exposure time of CIS is too long as 91.2ms when the rolling
shutter type CIS is used for a small area. In this case, the pixel value itself can
be saturated. Therefore, as shown in Fig. 7(c), the exposure time-division scheme is used during the analog CNN processing. To prevent
the CIS pixel from being saturated, a controllable reset is inserted in the between
fixed selection and reset signal. This reset signal can be controlled according to
the amount of light in the digital controller.
3. Error Tolerant Current-based Weighted-sum Unit
The value of the analog memory transfers into the current-based weighted-sum unit
and performs the convolution operation with the 3×3 weight kernel. The proposed error-tolerant
current-based weighted-sum unit is shown in Fig. 8. For current-based multiplication, the output voltage of the analog memory is converted
to the current domain through a linear V-I converter. The converted current of linear
V-I converter is duplicated by using drain regulation and the full range of input
value is converted linearly into the current as shown in Fig. 9(a). In the case of current based analog processing, it is important to reduce the main
branch current to achieve low power operation. To that end, the main current is designed
to be less than 0.5uA for the full range input value. This main current is multiplied
by the weight using a switched drain regulation (SDR) current mirror. Overall SDR
current mirror consists of a PMOS SDR current mirror which will be operated when the
sign is 1, and an NMOS SDR current mirror which will be operated when the sign is
0. Because of the low value of the main current and small size of the mirroring transistor,
it is important to reduce the mirroring error. By using a negative feedback loop for
drain regulation and designing to have at least 100 mV overdrive voltage for the mirroring
MOSFET, we reduced the mirroring error to under 6%, as shown in Fig. 9(a). This is very low considering the simple current mirror technique makes higher than
50% mirroring error. Because of the current domain advantages, the currents multiplied
with the
Fig. 8. Error tolerant current-based weighted-sum unit
Fig. 9. Measurement results of error tolerant current-based weighted-sum unit
weight are converted to the voltage domain by simple accumulation of current and passive
element without additional accumulation logic. Finally, 15.09 uW low power operation
is possible.
Additionally, for accurate analog computation, the mismatch of the analog convolution
unit should not affect the final quantization value. Therefore, as shown in Fig. 9(b), Monte-Carlo simulation is conducted in the output of the analog convolution unit.
As a result, it is confirmed that when the output mismatch is the largest which is
corresponding to the largest input voltage with maximum gain, the 3-sigma range of
the output is 14 mV, which is within 1/2 LSB when compared with the LSB of the ternary
quantization. In addition, when the output voltage is within the full 3-sigma range,
it is also confirmed that the 3-sigma value is 13.02 mV which is within 1/2 LSB.
Fig. 10. Maximum value searcher & ternary quantizer
4. Maximum Value Searcher & Ternary Quantizer
The output of weighted-sum unit is converted to the digital domain through a simple
analog ReLU unit consisting of a comparator and a multiplexer that do not require
DC bias currents for low power operation, and ternary quantizer as shown in Fig. 10. The analog ReLU unit determines the output value based on half VDD which indicates
digital zero. If the output of weighted-sum unit is larger than half VDD, it passes
the output of weighted-sum unit. In opposite, the output is driven to the half VDD.
After that, the output of analog ReLU unit is quantized by ternary quantizer. The
threshold value of quantizer is determined by half of MAX and ReLU-enable signal (R_EN).
The MAX is the pre-determined value which is the maximum value of previous frame.
In order to use the maximum value of the previous frame, the maximum value searcher
is integrated. The analog output of convolution result is stored in the Ctemp temporarily
and compared with the Cstore which has the previous maximum value. When the output
of comparator is high, Cstore is updated to the current analog convolution output.
At last, by using these quantized value to the index of LUT, the final digital value
(half VDD, 1/3MAX, and 2/3MAX) is generated and the remaining CNN operation is performed
through the digital processor.
Fig. 11. Layout photograph
IV. IMPLEMENTATION RESULTS
1. Layout Implementation Results
The proposed analog-digital mixed-mode face recognition processor is implemented
in the Samsung 65 nm CMOS logic process. Fig. 11 depicts its overall layout photograph. The proposed processor occupies a 3.6 mm×4.4
mm area where 3T-based 320×240 CIS, analog CNN processor, and digital processor are
integrated on a single chip. The processor operates at three different supply voltage
domains for imaging, analog processing and digital processing. The operating clock
frequency of the analog domain is set to 20 MHz while the operating frequency of the
digital domain can range from 50 MHz to 200 MHz. The maximum frame rate of the implemented
processor is 5-fps. At 1-fps frame rate, the analog domain consumes 58.8 uW for imaging
and analog convolution operation and the digital domain consumes 146.2 uW at 0.77
V supply voltage with 50 MHz frequency.
2. Simulation Results
Fig. 12(a) shows the face detection and face recognition accuracy of the proposed mixed-mode
face recognition processor. We used the UMD dataset (12) and the LFW dataset (8) for measuring face detection accuracy and face recognition accuracy, respectively.
For the case of face detection, the final decision is made based on the binarized
result whether a face exists in the image or not. With the modified deep CNN model
proposed for mixed-mode processing, we achieved 97.7% of face detection accuracy and
96.18% face
Fig. 12. Simulation results of proposed processor
recognition accuracy, which are only 1.61% and 1.37% worse than the baseline 32-bit
floating point implementation.
Fig. 12(b) shows the power reduction of the proposed processor. By reducing overhead of high-resolution
ADCs, the proposed mixed-mode process dissipates only 64 uW in always-on operation
mode (58.8 uW in the analog domain and 4.98uW in the digital domain), which is 33.3%
reduced from the state-of-the-art face recognition processor (2). Overall, the proposed processor dissipates 0.205 mW power for entire face recognition
which is 66.9% lower than (2). Fig. 12(c) represents the detailed power breakdown when the processor performs always-on operation
and overall system operation. When the processor performs always-on operation, the
analog CNN processor, CMOS image sensor, and digital processor consumes the 57.7 uW,
1.141 uW, and 4.98 uW power, respectively. As you can see here, the analog CNN processor
consumes the largest power portion which is 90.1%. However, in case of overall system
operation, as the face recognition network is much deeper than the face detection
network, the number of computation processed by digital processor is increased and
finally, digital domain consumes 0.146 mW power which is the 71.2% of the overall
power consumption. Table 1 summarizes the comparison of the proposed processor with the latest face recognition
processors. Reference (2) and (3) can support both face detection and face recognition but (13) can only support the face recognition. Unlike the previous works utilized
Table 1. Comparison table
|
JSSC’17 [3]
|
JSSC’18 [2]
|
ISCAS’19
[13]
|
This Work
|
Technology
|
40 nm
|
65 nm
|
65 nm
|
65 nm
|
Area
|
5.86 mm2 *
|
27.09 mm2 Ɨ
|
5.99 mm2 *
|
15.84 mm2 Ɨ
|
FDAlgorithm
|
Haar-like
|
Haar-like
|
-
|
CNN
|
FR Algorithm
|
PCA+SVM
|
CNN
|
CNN
|
CNN
|
Framerate
|
5.5 fps
|
1 fps
|
1 fps
|
5 fps
|
Always-on Power Consumption
|
-
|
96 uW
|
-
|
64 uW
|
FR Accuracy
|
81% @ 32class in LFW
|
97% @ whole LFW
|
95.4% @ whole LFW
|
96.18 @ whole LFW
|
Resolution
|
HD
|
QVGA
|
-
|
QVGA
|
Overall System Power Consumption
|
23 mW
|
0.62 mW
|
0.26 mW
|
0.205 mW
@1 fps
|
* Only FR core includes Ɨ Both CIS & FR core includes
traditional feature-based face detection or recognition, the proposed processor adopts
a deep CNN algorithm in both face detection and face recognition to obtain high recognition
accuracy. Despite CNN’s large amount of computations, the proposed processor consumes
the least power by reducing always-on power using low power analog circuits and by
reducing the weight bit-precision in digital computation. Moreover, even the proposed
system integrates the CMOS image sensor and face detection process for implementing
the whole face recognition system, the overall power consumption is 17.4% lower than
(13).
V. CONCLUSIONS
Despite recent DNN based advances in face recognition technology, most of previous
face recognition processors (2,3,5) focused only on reducing power consumption in event-driven face recognition rather
than always-on image acquisition and face detection. However, considering the number
of occurrences between the always-on operation and event-driven operation, the power
reduction of the always-on operation is important to reduce overall system power consumption.
Unlike conventional face recognition processor architecture composed of an image sensor,
high-resolution ADCs, and a digital signal processor, the proposed processor suggests
a new mixed-mode architecture that removes the power-hungry ADCs and introduces an
analog signal processor for processing the first layer of convolutional neural networks
used by both face detection and face recognition. The proposed architecture achieves
at least 57.9% power reduction compared to the various type of ADC based image sensor.
More specifically in circuit level, we propose the reconfigurable CDS readout circuit
and exposure time division scheme to integrate image sensor and analog signal processor
without losing input data in various illumination conditions. Moreover, we propose
the error-tolerant current-based weighted-sum unit to compute the integrated input
layer of both the face detection and face recognition CNN, with only 15.09 uW power
consumption.
As a result, in 65 nm process, we demonstrate our mixed-mode face recognition processor
consumes the total 0.205 mW power in overall system operation and 64 uW for always-on
operation, which are 66.9% and 33.9% less than the state-of-the-art design.
ACKNOWLEDGMENTS
This research was supported in part by the MSIT (Ministry of Science and ICT), Korea,
under the ITRC (Information Technology Research Center) support program (IITP-2020-0-01847)
supervised by the IITP (Institute of Information & Communications Technology Planning
& Evaluation), and Samsung Electronics.
REFERENCES
Fernandez E., Jimenea D., November 2016, Face Recognition for Authentication on Mobile
Devices, Image and Vision Computing, Vol. 55, pp. 31-33
Bong K., Choi S., Kim C., Han D., Yoo H., January 2018, A Low-Power Convolutional
Neural Network Face Recognition Processor and a CIS Integrated with Always-on Face
Detector, IEEE J. Solid-State Circuit, Vol. 53, pp. 115-123
Jeon D., Dong Q., Kim Y., Wang X., Chen S., Yu H., Blaauw D., Sylvester D., June 2017,
A 23-mW Face Recognition Processor with Mostly Read 5T Memory in 40nm CMOS, IEEE J.
Solid-State Circuits, Vol. 52, pp. 1628-1642
Rusci M., Rossi D., Farella E., October 2017, A Sub-Mw IoT-Endnode for Always-On Visual
Monitoring and Smart Triggering, IEEE Internet of Things Journal, Vol. 4, No. 5
Kang S., Lee J., Kim C., Yoo H., October 2018, B-Face: 0.2mW CNN-Based Face Recognition
Processor with Face Alignment for Mobile User Identification,, IEEE Symposium on VLSI
Circuits
Judd P., Albericio J., Hetherington T., Aamodt T. M., Moshovos A., October 2016, Stripes:
Bit-serial deep neural network computing, in Proc. 49th Annu, IEEE/ACM Int. Symp.
Microarchitecture (MICRO), Taipei, Taiwan, pp. 1-12
Shin M., Kim J., Kim M., Jo Y., Kwon O., June 2012, A 1.92-Mega pixel CMOS Image Sensor
with Column-Parallel Low-Power and Area Efficient SA-ADCs, IEEE Transactions on Electron
Devices, Vol. 59, pp. 1693-1700
Miller E., Huang G., Roychowdhury A., Li H., Hua G., 2016, Labeled Faces in the Wild:
A Survey, Springer, pp. 189-248
Kim D., Song M., September 2012, An Enhanced Dynamic-Range CMOS Image Sensor Using
a Digital Logarithmic Single-Slope ADC, IEEE Transactions on Circuits and Systems
II: Express Briefs, Vol. 59, pp. 653-657
Kim D., Song M., September 2012, An Enhanced Dynamic-Range CMOS Image Sensor Using
a Digital Logarithmic Single-Slope ADC, IEEE Transactions on Circuits and Systems
II: Express Briefs, Vol. 59, pp. 653-657
Lee I., Sylvester D., Blaauw D., January 2017, A Subthreshold Voltage Reference with
Scalable Output Voltage for Low-Power IoT Systems, IEEE J. Solid-State Circuits, Vol.
52, pp. 1443-1449
Bansal A., Nanduri An., Castillo C., Ranjan R., Chellappa R., November 2016, UMDFaces:
An Annotated Face Dataset for Training Deep Networks, Arxiv
Kim S., Lee J., Kang S., Lee J., Yoo H.-J., 2019, A 15.2 TOPS/W CNN accelerator with
similar feature skipping for face recognition in mobile devices, in Proc. IEEE Int.
Symp. Circuits Syst. (ISCAS), Sapporo, Japan, pp. 1-5
Author
Ji-Hoon Kim received the B.S. degree in electrical engineering from Kyung-Hee University,
Suwon, South Korea, in 2017 and the M.S. degree in electrical engineering from the
Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea,
in 2019, where he is currently pursuing the Ph.D. degree.
His current research interests span various aspects of hardware system design including
low power deep learning and intelligent vision SoC design with memory architecture,
hardware accelerator for computing system, embedded system development with FPGA,
computer architecture, and hardware/software co-design for hardware development.
Changhyeon Kim received the B.S (2014), M.S (2016), and Ph.D (2020) degrees in electrical
engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon,
South Korea.
His current research interests include low power SoC design, especially focused on
parallel processor for artificial intelligence and machine learning algorithms.
Kwantae Kim received the B.S. and M.S. degrees in electrical engi-neering from the
Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea,
in 2015 and 2017, respectively, where he is currently pursuing the Ph.D. degree in
electrical engineering.
From 2015 to 2017, he was with Healthrian R&D Center, Daejeon, where he designed
bio-potential readout IC design for mobile healthcare solutions.
He is also a Visiting Student with the Institute of Neuroinformatics, University
of Zurich and ETH Zürich, Zürich, Switzerland.
His research interests include designing lowpower bio-impedance sensors and low-power
neuromorphic audio sensors.
Mr. Kim received the Un Chong-Kwan Scholarship Award from KAIST for his achievement
of excellence in entrance examination in 2015.
He was a recipient of the Silver Prizes in the 25th HumanTech Paper Award from Samsung
Electronics, Suwon, South Korea, in 2019.
Juhyoung Lee received the B.S. degree in electrical engineering from the Korea Advanced
Institute of Science and Technology (KAIST), Daejeon, Korea in 2017, and the M.S.
degree in electrical engineering from the KAIST in 2019, where he is currently pursuing
the Ph. D. degree.
He is a student member of IEEE.
His current research interests include energy-efficient multicore architectures/accelerator
ASICs/systems especially focused on artificial intelligence including deep reinforcement
learning and computer vision, energy-efficient processing-in-memory accelerator, and
deep learning algorithm for efficient processing.
Hoi-Jun Yoo received the bachelor’s degree from the Electronics Depart-ment, Seoul
National University, Seoul, South Korea, in 1983, and the M.S. and Ph.D. degrees in
electrical engineering from the Korea Advanced Institute of Science and Technology
(KAIST), Daejeon, South Korea, in 1985 and 1988, respectively.
He was the VCSEL pioneer at Bell Communications Research, Red Bank, NJ, USA, and
the Manager of the DRAM Design Group, Hyundai Electronics Inc., Ichon, South Korea,
during the era of 1M DRAM up to 256M SDRAM.
From 2003 to 2005, he served as the full-time Advisor to the Minister of the Korean
Ministry of Information and Communication for SoC and next-generation computing.
He is currently an ICT Endowed Chair Professor with the School of Electrical Engineering
and the Director of the System Design Innovation and Application Research Center (SDIA),
KAIST.
He has published more than 400 articles and wrote or edited five books: DRAM Design
(1997, Hongneung), High Performance DRAM (1999 Hongneung), Low Power NoC for High
Performance SoC Design (2008, CRC), Mobile 3D Graphics SoC (2010, Wiley), and Biomedical
CMOS ICs (Co-editing with Chris Van Hoof, 2010, Springer), and co-written chapters
in numerous books.
His current research interests include bio-inspired artificial intelligence (AI)
chip design and multicore AI SoC design, including DNN accelerators, wearable healthcare
systems, network-on-chip, and high-speed low-power memory.
Dr. Yoo is an Executive Committee Member of the Symposium on VLSI and a Steering
Committee Member of the Asian Solid-State Circuits Conference (A-SSCC), of which he
was nominated as the Steering Committee Chair from 2020 to 2025.
He received the Order of Service Merit from the Korean Government in 2011 for his
contributions to the Korean memory industry, the Scientist/Engineer of the month Award
from the Ministry of Education, Science and Technology of Korea in 2010, the Kyung-Am
Scholar Award in 2014, the Electronic Industrial Association of Korea Award for his
contributions to the DRAM technology in 1994, the Hynix Development Award in 1995,
the Korea Semiconductor Industry Association Award in 2002, the Best Research of KAIST
Award in 2007, the Excellent Scholar of KAIST Award in 2011, and the Best Scholar
of KAIST Award in 2019.
He was a co-recipient of the ASP-DAC Design Award in 2001, the A-SSCC Outstanding
Design Awards in 2005, 2006, 2007, 2010, 2011, and 2014, the International Solid-State
Circuits Conference (ISSCC)/DAC Student Design Contest Awards in 2007, 2008, 2010,
and 2011, the ISSCC Demonstration Session Recognition in 2016, 2017, and 2019, and
the Best Paper Award of the IEEE International Conference on Artificial Intelligence
Circuits and Systems in 2019.
He was a TPC Chair of the ISSCC 2015, a Plenary Speaker of the ISSCC 2019 entitled
Intelligence on Silicon: From Deep Neural Network Accelerators to Brain-Mimicking
AI-SoCs, and the Chair of the Technology Direction (TD) Subcommittee of the ISSCC
2013.
He has served as an Executive Committee Member of the ISSCC, the IEEE SSCS Distinguished
Lecturer from 2010 to 2011, and the TPC Chair of the International Symposium on Wearable
Computers (ISWC) 2010 and the A-SSCC 2008.
He was the Editor-in-Chief (EIC) of the Journal of Semiconductor Technology and Science
(JSTS) published by the Korean Institute of Electronics and Information Engineers
from 2015 to 2019 and a Guest Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC)
and the T-BioCAS.
He is also an Associate Editor of the IEEE JSSC and the IEEE SOLID-STATE CIRCUITS
LETTERS (SSCL).
Joo-Young Kim received the B.S., M.S., and Ph. D degree in Electrical Engineering
from Korea Advanced Institute of Science and Technology (KAIST), in 2005, 2007, and
2010, respectively.
He is currently an assistant professor in the School of Electrical Engineering at
KAIST since September 2019.
His research interests span various aspects of hardware design including VLSI design,
computer architecture, FPGA, domain specific accelerators, hardware/software co-design,
and agile hardware development.
Before joining KAIST, Joo-Young was a Senior Hardware Engineering Lead at Microsoft
Azure working on hardware acceleration for its hyper-scale big data analytics platform
named Azure Data Lake.
Before that, he was one of the initial members of Catapult project at Microsoft Research,
where he deployed a fabric of FPGAs in datacenters to accelerate critical cloud services
such as machine learning, data storage, and networking.
Joo-Young is a recipient of the 2016 IEEE Micro Top Picks Award, the 2014 IEEE Micro
Top Picks Award, the 2010 DAC/ISSCC Student Design Contest Award, the 2008 DAC/ISSCC
Student Design Contest Award, and the 2006 A-SSCC Student Design Contest Award.
He serves as Associate Editor for the IEEE Transactions on Circuits and Systems I:
Regular Papers (2020-2021).