Mobile QR Code QR CODE

  1. (School of EE, KAIST, 291, Daehak-ro, Yuseong-gu, Daejeon, Korea.)



Always-on, CMOS image sensor (CIS), mixed-mode, system-on-chip (SoC)

I. INTRODUCTION

As the number of mobile devices grows exponentially, user authentication that verifies the identity of a person using the device becomes crucial. Among various user authentication methods including traditional passcode-based methods and modern biometrics-based methods such as fingerprint and iris recognition, face recognition (FR) is getting attention in the mobile platform as it requires no direct physical contact. Due to the recent advance in recognition algorithm using deep neural networks, its accuracy and robustness level are also commercially viable (1). However, face recognition-based user authentication is computationally challenging on mobile devices as it needs to be always active despite of its limited battery capacity (2).

Fig. 1 illustrates three stages of face recognition pipelines (2,3) which consist of image acquisition that acquires the input image from image sensor, face detection (FD) which detects possible interested regions that may include face in the image, and face recognition that extracts features out of the detected regions and matches them against database. The first two stages operate at all times to detect a face in the input image while the third recognition stage is only activated when the face is detected. Although the recognition stage involves heavy computations, its average power consumption is practically less than 5% of total power consumption due to its low occurrences (2,4). Therefore, it is imperative to realize low-power operation in the first two always-on stages to reduce overall power consumption.

Fig. 1. Conventional face recognition pipeline

../../Resources/ieie/JSTS.2020.20.6.499/fig1.png

Previous face recognition processors (2,3,5) utilized a digital signal processor (DSP) to implement above face recognition pipelines. They used an array of analog-to-digital converters (ADCs) to convert analog sensor signals to digital domain immediately after the image acquisition and used DSP to handle complex computations in the detection and recognition stage. Although this approach is simple and easy to implement, it requires high-resolution ADCs to meet the recognition algorithm’s accuracy requirement. For most popular deep convolutional neural network (CNN) based algorithms such as AlexNet and GoogLeNet, the bit resolution should be at least 10 to achieve a desirable recognition accuracy (6) and the prior works (2,3,13) use 16-bit for input bit-precision to guarantee the high face recognition accuracy. As well-known from previous literature (2,3,5), high-resolution ADC is power-hungry and an array of high-resolution ADCs becomes a major bottleneck in power consumption with accounting for more than half of the total power consumption (2,7).

In this paper, we present an ultra-low-power always-on face recognition processor for mobile devices. To this end, we propose an analog-digital mixed-mode architecture that removes the power-hungry ADCs and conducts high-resolution in the analog domain. In the analog CNN processor, we propose three key building blocks: reconfigurable correlated double sampling (CDS) readout circuit for adaptive domain conversion, leakage tolerant analog memory, and error-tolerant current-based weighted-sum unit.

II. ANALOG-DIGITAL MIXED-MODE ARCHITECTURE

1. Modified Face Recognition Pipeline in Mixed-mode

To solve the main power bottleneck due to high-

Fig. 2. Proposed analog-digital mixed-mode architecture

../../Resources/ieie/JSTS.2020.20.6.499/fig2.png

resolution ADCs, we propose an analog-digital mixed-mode architecture composed of an image sensor, an analog CNN processor, and a DSP as shown in Fig. 2. The ADCs used to convert raw image signals in the analog domain to high-resolution digital signals are now replaced by the analog CNN processor with a ternary quantizer. Unlike the conventional architecture processes all CNN pipelines in the digital domain, the proposed architecture pushes a part of CNN processing into the analog domain for low power consumption. The analog CNN processor computes the input layer of the CNN pipelines in the analog domain with very high resolution (> 32-bit floating point) and then quantizes the results into 3 digital values, i.e., 0, 1, 2. To this end, we modified a conventional face recognition pipeline that has two different CNNs for face detection and face recognition, respectively, to having a shared input layer and then diverge into separated CNN layers. To adopt the shared input layer, at first each face detection and face recognition network is trained independently and the input layer of the face detection network is replaced to the input layer of the face recognition network. After that, while the input layer is fixed, the rest of the layers are re-trained for fine-tuning. The detailed face detection and face recognition network configuration is shown in Fig. 3.

Fig. 3. Face detection & recognition network configuration

../../Resources/ieie/JSTS.2020.20.6.499/fig3.png

face recognition network configuration is shown in Fig. 3. As the roles of the input layer processing are the same from both CNNs, the modified recognition pipeline doesn’t affect algorithmic accuracy. More specifically, the baseline FR accuracy using 32-bit floating point (FP) input and weight is 97.52%. First, to perform analog to digital domain conversion and to replace the power consuming ADC, we adopt ternary quantization in output of the first layer. In that case, simulated FR accuracy is degraded to 96.9%. Then to reduce the total power consumption in analog-digital mixed-mode CNN, the input bit-width and weight bit-width of overall convolutional layers are reduced to 8-bit fixed point except the shared input layer which adopts 4-bit fixed point for the weight bit-precision. Overall, the modified face recognition pipeline achieves 96.18% recognition accuracy in the LFW dataset (8) while the baseline face recognition algorithm achieves 97.52% recognition accuracy with 32-bit floating-point weights. This 1.37% accuracy loss is promising considering we used much lower resolutions in weight values and used ternary quantization after analog computation. Additionally, compared to the latest binary weight face recognition processor (5) which achieves 96% recognition accuracy, the accuracy of 96.18% is quite reasonable.

The unified face recognition pipeline in the proposed mixed-mode architecture is summarized as follows. First, when the image sensor captures the image, the analog CNN processor processes the shared layer of face detection and face recognition CNN. It then quantizes the results to ternary value and the ternary value chooses 16-bit data in look-up-table (LUT) that goes into the digital domain. These 16-bit data are the pre-transposed digital value of half-VDD, one-third, and two-thirds of

Fig. 4. Power reduction of proposed architecture

../../Resources/ieie/JSTS.2020.20.6.499/fig4.png

maximum analog processing value which is pre-captured in maximum search unit in analog CNN processor. This maximum value can cover the whole range of analog processing outputs. Then, based on these outputs, the rest of face detection CNN layers detect the face region of interest from the entire image, and if a face is detected, the rest of face recognition CNN layers decide whether identification result is true or not by reusing these ternary value.

The proposed analog-digital mixed-mode architecture can eliminate the power consumption for domain conversion in conventional architecture. Fig. 4 illustrates the power reduction of proposed architecture compared to the previous conventional architecture (7,9,10) with different kinds of ADC. The power of each image sensor is normalized to the 320×240 pixel array and 1-fps frame rate. In the proposed architecture, the image sensor cell array and readout circuit consume 0.624 uW and 0.517 uW, respectively, and analog CNN processor consumes 57.7 uW. This total power of 58.84 uW can achieve at least 57.9% power reduction compared to the various type of ADC based image sensor and this power gain is due to the absence of high-resolution ADC array which is the power bottleneck of conventional architecture.

2. Overall Block Diagram

Fig. 5 illustrates the overall block diagram of the proposed analog-digital mixed-mode face recognition that has 3 main components: a CMOS image sensor (CIS), an analog CNN processor, and a digital processor. The CIS is composed of an array of 320×240 3T pixels and a CDS readout circuits that pass image pixel data to

Fig. 5. Overall block diagram

../../Resources/ieie/JSTS.2020.20.6.499/fig5.png

the analog CNN processor. The analog CNN processor, which performs the shared layer of face detection and face recognition CNN pipelines, has 5 building blocks: analog memory, current-based weighted-sum unit, analog-ReLU unit, maximum value searcher and ternary quantization unit with LUT. The image pixel data read from the CIS row-by row are buffered to the analog memory that can hold up to 320×3 pixels. Thereafter, 3×3 values in analog memory corresponding to the 3×3 kernel are selected, and the data are broadcasted to the 64 current-based weighted-sum units corresponding to a 3×3×64 weight kernel. When the required convolution operation with the 3×3 kernel is completed for the 320×3 input image, a new row of CIS is loaded and repeat the above convolution operations until the last row of CIS. Finally, the output of the current-based weighted-sum unit is converted to the digital value through the analog ReLU and ternary quantization. At this time, the maximum value of the analog convolution operation is pre-captured by the maximum value search unit which will be used as the digital value in LUT. The digital processor includes a DNN processor that accelerates the rest of the face detection and face recognition CNN pipelines with a cluster of convolution cores and an aggregator. It also has a controller for controlling both an analog processor and a DNN processor.

III. DETAILED ANALOG CIRCUITS

1. Reconfigurable CDS Readout Circuit for Adaptive Domain Conversion

The CIS pixel value is read through the CDS readout circuit and is passed to the analog CNN processor.

Fig. 6. Reconfigurable CDS readout circuit

../../Resources/ieie/JSTS.2020.20.6.499/fig6.png

Although the CMOS image sensor requires a high supply voltage of 2.5 V to cover a large dynamic range of illuminance, the analog CNN processor does not need a high supply voltage that causes high power consumption. Therefore, we lowered the supply voltage of the analog CNN processor to 1.2 V and reduced power consumption by 52% compared to 2.5 V supply voltage in overall analog computation. However, if the supply voltage of the analog processing is lowered, it is necessary to change the sensor output to fit into the corresponding lower supply voltage range. Although the output value of CIS is variable depending on the amount of illuminance, the readout circuit should be able to convert the output of CIS into the desired voltage range regardless of the amount of illuminance. Therefore, as shown in Fig. 6, a reconfigurable CDS readout circuit is proposed. The reconfigurable CDS readout circuit consists of a sample & hold with switched capacitor circuit for readout the pixel value and the adaptive domain conversion is implemented through the reconfigurable voltage reference circuit. Because of the switched capacitor operation, when the amount of illuminance is large, voltage reference should be increased to convert pixel value into the desired output voltage range (0.23 V ~ 0.78 V). Therefore, by using four reconfigurable voltage reference value (0.748 V, 0.922 V, 1.292 V, and 1.84 V), it can cover the full range of 60-900 illuminance when simulation conducts in 61,500 sensitivity (e-/lx*s) pixels with 2 ms exposure time. These four reconfigurable voltage reference values are implemented by a high output multi-voltage reference circuit which is presented in (11). By using this, from the low reference value to the high reference value, a wide range of voltage references can be generated and the desired reference voltage value can be selected according to the amount of illuminance in

Fig. 7. Leakage tolerant analog memory w/ exposure time division

../../Resources/ieie/JSTS.2020.20.6.499/fig7.png

the digital controller. Finally, by using the proposed reconfigurable CDS readout circuit, we can realize the desired analog processing in various amounts of illuminance.

2. Leakage Tolerant Analog Memory w/Exposure Time Division

Analog memory is structured as 320×3 source follower based memory cells and it holds the 3 rows of pixel array for calculating convolution operation with 3×3 weight kernel. The input data stored in a 3x3 window in the analog memory are broadcasted into the 64 current-based weighted-sum units and each of the current-based weighted-sum units has 3×3 weight values. In this case, each row of memory is connected to the corresponding row of weight kernel by the SEL signal of the switching network. Then the convolution operation can be conducted by controlling REN signal in memory cell the same as the 3×3 weight kernel sweep. Therefore, by controlling the REN and SEL signal, convolution operations with the desired 3×3 weight kernel are possible.

In Fig. 7(a), the timing diagram of analog processing is represented. For single 3×3 convolution computing, analog memory should hold the value during the 0.9 us. Therefore, for computing whole data to the 3×3 weight kernel, analog memory should maintain a row data for 1.14 ms and the loss of image data should be kept to a minimum during this time. As shown in Fig. 7(b), the 55.6 fF of the storage capacitor has only 1.9% and 2.3% variation in minimum and maximum input value which is quite negligible in analog processing.

Additionally, considering the analog processing time in the above timing diagram, there is a problem that the exposure time of CIS is too long as 91.2ms when the rolling shutter type CIS is used for a small area. In this case, the pixel value itself can be saturated. Therefore, as shown in Fig. 7(c), the exposure time-division scheme is used during the analog CNN processing. To prevent the CIS pixel from being saturated, a controllable reset is inserted in the between fixed selection and reset signal. This reset signal can be controlled according to the amount of light in the digital controller.

3. Error Tolerant Current-based Weighted-sum Unit

The value of the analog memory transfers into the current-based weighted-sum unit and performs the convolution operation with the 3×3 weight kernel. The proposed error-tolerant current-based weighted-sum unit is shown in Fig. 8. For current-based multiplication, the output voltage of the analog memory is converted to the current domain through a linear V-I converter. The converted current of linear V-I converter is duplicated by using drain regulation and the full range of input value is converted linearly into the current as shown in Fig. 9(a). In the case of current based analog processing, it is important to reduce the main branch current to achieve low power operation. To that end, the main current is designed to be less than 0.5uA for the full range input value. This main current is multiplied by the weight using a switched drain regulation (SDR) current mirror. Overall SDR current mirror consists of a PMOS SDR current mirror which will be operated when the sign is 1, and an NMOS SDR current mirror which will be operated when the sign is 0. Because of the low value of the main current and small size of the mirroring transistor, it is important to reduce the mirroring error. By using a negative feedback loop for drain regulation and designing to have at least 100 mV overdrive voltage for the mirroring MOSFET, we reduced the mirroring error to under 6%, as shown in Fig. 9(a). This is very low considering the simple current mirror technique makes higher than 50% mirroring error. Because of the current domain advantages, the currents multiplied with the

Fig. 8. Error tolerant current-based weighted-sum unit

../../Resources/ieie/JSTS.2020.20.6.499/fig8.png

Fig. 9. Measurement results of error tolerant current-based weighted-sum unit

../../Resources/ieie/JSTS.2020.20.6.499/fig9.png

weight are converted to the voltage domain by simple accumulation of current and passive element without additional accumulation logic. Finally, 15.09 uW low power operation is possible.

Additionally, for accurate analog computation, the mismatch of the analog convolution unit should not affect the final quantization value. Therefore, as shown in Fig. 9(b), Monte-Carlo simulation is conducted in the output of the analog convolution unit. As a result, it is confirmed that when the output mismatch is the largest which is corresponding to the largest input voltage with maximum gain, the 3-sigma range of the output is 14 mV, which is within 1/2 LSB when compared with the LSB of the ternary quantization. In addition, when the output voltage is within the full 3-sigma range, it is also confirmed that the 3-sigma value is 13.02 mV which is within 1/2 LSB.

Fig. 10. Maximum value searcher & ternary quantizer

../../Resources/ieie/JSTS.2020.20.6.499/fig10.png

4. Maximum Value Searcher & Ternary Quantizer

The output of weighted-sum unit is converted to the digital domain through a simple analog ReLU unit consisting of a comparator and a multiplexer that do not require DC bias currents for low power operation, and ternary quantizer as shown in Fig. 10. The analog ReLU unit determines the output value based on half VDD which indicates digital zero. If the output of weighted-sum unit is larger than half VDD, it passes the output of weighted-sum unit. In opposite, the output is driven to the half VDD. After that, the output of analog ReLU unit is quantized by ternary quantizer. The threshold value of quantizer is determined by half of MAX and ReLU-enable signal (R_EN). The MAX is the pre-determined value which is the maximum value of previous frame. In order to use the maximum value of the previous frame, the maximum value searcher is integrated. The analog output of convolution result is stored in the Ctemp temporarily and compared with the Cstore which has the previous maximum value. When the output of comparator is high, Cstore is updated to the current analog convolution output. At last, by using these quantized value to the index of LUT, the final digital value (half VDD, 1/3MAX, and 2/3MAX) is generated and the remaining CNN operation is performed through the digital processor.

Fig. 11. Layout photograph

../../Resources/ieie/JSTS.2020.20.6.499/fig11.png

IV. IMPLEMENTATION RESULTS

1. Layout Implementation Results

The proposed analog-digital mixed-mode face recognition processor is implemented in the Samsung 65 nm CMOS logic process. Fig. 11 depicts its overall layout photograph. The proposed processor occupies a 3.6 mm×4.4 mm area where 3T-based 320×240 CIS, analog CNN processor, and digital processor are integrated on a single chip. The processor operates at three different supply voltage domains for imaging, analog processing and digital processing. The operating clock frequency of the analog domain is set to 20 MHz while the operating frequency of the digital domain can range from 50 MHz to 200 MHz. The maximum frame rate of the implemented processor is 5-fps. At 1-fps frame rate, the analog domain consumes 58.8 uW for imaging and analog convolution operation and the digital domain consumes 146.2 uW at 0.77 V supply voltage with 50 MHz frequency.

2. Simulation Results

Fig. 12(a) shows the face detection and face recognition accuracy of the proposed mixed-mode face recognition processor. We used the UMD dataset (12) and the LFW dataset (8) for measuring face detection accuracy and face recognition accuracy, respectively. For the case of face detection, the final decision is made based on the binarized result whether a face exists in the image or not. With the modified deep CNN model proposed for mixed-mode processing, we achieved 97.7% of face detection accuracy and 96.18% face

Fig. 12. Simulation results of proposed processor

../../Resources/ieie/JSTS.2020.20.6.499/fig12.png

recognition accuracy, which are only 1.61% and 1.37% worse than the baseline 32-bit floating point implementation.

Fig. 12(b) shows the power reduction of the proposed processor. By reducing overhead of high-resolution ADCs, the proposed mixed-mode process dissipates only 64 uW in always-on operation mode (58.8 uW in the analog domain and 4.98uW in the digital domain), which is 33.3% reduced from the state-of-the-art face recognition processor (2). Overall, the proposed processor dissipates 0.205 mW power for entire face recognition which is 66.9% lower than (2). Fig. 12(c) represents the detailed power breakdown when the processor performs always-on operation and overall system operation. When the processor performs always-on operation, the analog CNN processor, CMOS image sensor, and digital processor consumes the 57.7 uW, 1.141 uW, and 4.98 uW power, respectively. As you can see here, the analog CNN processor consumes the largest power portion which is 90.1%. However, in case of overall system operation, as the face recognition network is much deeper than the face detection network, the number of computation processed by digital processor is increased and finally, digital domain consumes 0.146 mW power which is the 71.2% of the overall power consumption. Table 1 summarizes the comparison of the proposed processor with the latest face recognition processors. Reference (2) and (3) can support both face detection and face recognition but (13) can only support the face recognition. Unlike the previous works utilized

Table 1. Comparison table

JSSC’17 [3]

JSSC’18 [2]

ISCAS’19

[13]

This Work

Technology

40 nm

65 nm

65 nm

65 nm

Area

5.86 mm2 *

27.09 mm2 Ɨ

5.99 mm2 *

15.84 mm2 Ɨ

FDAlgorithm

Haar-like

Haar-like

-

CNN

FR Algorithm

PCA+SVM

CNN

CNN

CNN

Framerate

5.5 fps

1 fps

1 fps

5 fps

Always-on Power Consumption

-

96 uW

-

64 uW

FR Accuracy

81% @ 32class in LFW

97% @ whole LFW

95.4% @ whole LFW

96.18 @ whole LFW

Resolution

HD

QVGA

-

QVGA

Overall System Power Consumption

23 mW

0.62 mW

0.26 mW

0.205 mW

@1 fps

* Only FR core includes Ɨ Both CIS & FR core includes

traditional feature-based face detection or recognition, the proposed processor adopts a deep CNN algorithm in both face detection and face recognition to obtain high recognition accuracy. Despite CNN’s large amount of computations, the proposed processor consumes the least power by reducing always-on power using low power analog circuits and by reducing the weight bit-precision in digital computation. Moreover, even the proposed system integrates the CMOS image sensor and face detection process for implementing the whole face recognition system, the overall power consumption is 17.4% lower than (13).

V. CONCLUSIONS

Despite recent DNN based advances in face recognition technology, most of previous face recognition processors (2,3,5) focused only on reducing power consumption in event-driven face recognition rather than always-on image acquisition and face detection. However, considering the number of occurrences between the always-on operation and event-driven operation, the power reduction of the always-on operation is important to reduce overall system power consumption. Unlike conventional face recognition processor architecture composed of an image sensor, high-resolution ADCs, and a digital signal processor, the proposed processor suggests a new mixed-mode architecture that removes the power-hungry ADCs and introduces an analog signal processor for processing the first layer of convolutional neural networks used by both face detection and face recognition. The proposed architecture achieves at least 57.9% power reduction compared to the various type of ADC based image sensor.

More specifically in circuit level, we propose the reconfigurable CDS readout circuit and exposure time division scheme to integrate image sensor and analog signal processor without losing input data in various illumination conditions. Moreover, we propose the error-tolerant current-based weighted-sum unit to compute the integrated input layer of both the face detection and face recognition CNN, with only 15.09 uW power consumption.

As a result, in 65 nm process, we demonstrate our mixed-mode face recognition processor consumes the total 0.205 mW power in overall system operation and 64 uW for always-on operation, which are 66.9% and 33.9% less than the state-of-the-art design.

ACKNOWLEDGMENTS

This research was supported in part by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2020-0-01847) supervised by the IITP (Institute of Information & Communications Technology Planning & Evaluation), and Samsung Electronics.

REFERENCES

1 
Fernandez E., Jimenea D., November 2016, Face Recognition for Authentication on Mobile Devices, Image and Vision Computing, Vol. 55, pp. 31-33DOI
2 
Bong K., Choi S., Kim C., Han D., Yoo H., January 2018, A Low-Power Convolutional Neural Network Face Recognition Processor and a CIS Integrated with Always-on Face Detector, IEEE J. Solid-State Circuit, Vol. 53, pp. 115-123DOI
3 
Jeon D., Dong Q., Kim Y., Wang X., Chen S., Yu H., Blaauw D., Sylvester D., June 2017, A 23-mW Face Recognition Processor with Mostly Read 5T Memory in 40nm CMOS, IEEE J. Solid-State Circuits, Vol. 52, pp. 1628-1642DOI
4 
Rusci M., Rossi D., Farella E., October 2017, A Sub-Mw IoT-Endnode for Always-On Visual Monitoring and Smart Triggering, IEEE Internet of Things Journal, Vol. 4, No. 5DOI
5 
Kang S., Lee J., Kim C., Yoo H., October 2018, B-Face: 0.2mW CNN-Based Face Recognition Processor with Face Alignment for Mobile User Identification,, IEEE Symposium on VLSI CircuitsDOI
6 
Judd P., Albericio J., Hetherington T., Aamodt T. M., Moshovos A., October 2016, Stripes: Bit-serial deep neural network computing, in Proc. 49th Annu, IEEE/ACM Int. Symp. Microarchitecture (MICRO), Taipei, Taiwan, pp. 1-12DOI
7 
Shin M., Kim J., Kim M., Jo Y., Kwon O., June 2012, A 1.92-Mega pixel CMOS Image Sensor with Column-Parallel Low-Power and Area Efficient SA-ADCs, IEEE Transactions on Electron Devices, Vol. 59, pp. 1693-1700DOI
8 
Miller E., Huang G., Roychowdhury A., Li H., Hua G., 2016, Labeled Faces in the Wild: A Survey, Springer, pp. 189-248DOI
9 
Kim D., Song M., September 2012, An Enhanced Dynamic-Range CMOS Image Sensor Using a Digital Logarithmic Single-Slope ADC, IEEE Transactions on Circuits and Systems II: Express Briefs, Vol. 59, pp. 653-657DOI
10 
Kim D., Song M., September 2012, An Enhanced Dynamic-Range CMOS Image Sensor Using a Digital Logarithmic Single-Slope ADC, IEEE Transactions on Circuits and Systems II: Express Briefs, Vol. 59, pp. 653-657DOI
11 
Lee I., Sylvester D., Blaauw D., January 2017, A Subthreshold Voltage Reference with Scalable Output Voltage for Low-Power IoT Systems, IEEE J. Solid-State Circuits, Vol. 52, pp. 1443-1449DOI
12 
Bansal A., Nanduri An., Castillo C., Ranjan R., Chellappa R., November 2016, UMDFaces: An Annotated Face Dataset for Training Deep Networks, ArxivDOI
13 
Kim S., Lee J., Kang S., Lee J., Yoo H.-J., 2019, A 15.2 TOPS/W CNN accelerator with similar feature skipping for face recognition in mobile devices, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), Sapporo, Japan, pp. 1-5DOI

Author

Ji-Hoon Kim
../../Resources/ieie/JSTS.2020.20.6.499/au1.png

Ji-Hoon Kim received the B.S. degree in electrical engineering from Kyung-Hee University, Suwon, South Korea, in 2017 and the M.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2019, where he is currently pursuing the Ph.D. degree.

His current research interests span various aspects of hardware system design including low power deep learning and intelligent vision SoC design with memory architecture, hardware accelerator for computing system, embedded system development with FPGA, computer architecture, and hardware/software co-design for hardware development.

Changhyeon Kim
../../Resources/ieie/JSTS.2020.20.6.499/au2.png

Changhyeon Kim received the B.S (2014), M.S (2016), and Ph.D (2020) degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea.

His current research interests include low power SoC design, especially focused on parallel processor for artificial intelligence and machine learning algorithms.

Kwantae Kim
../../Resources/ieie/JSTS.2020.20.6.499/au3.png

Kwantae Kim received the B.S. and M.S. degrees in electrical engi-neering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2015 and 2017, respectively, where he is currently pursuing the Ph.D. degree in electrical engineering.

From 2015 to 2017, he was with Healthrian R&D Center, Daejeon, where he designed bio-potential readout IC design for mobile healthcare solutions.

He is also a Visiting Student with the Institute of Neuroinformatics, University of Zurich and ETH Zürich, Zürich, Switzerland.

His research interests include designing lowpower bio-impedance sensors and low-power neuromorphic audio sensors.

Mr. Kim received the Un Chong-Kwan Scholarship Award from KAIST for his achievement of excellence in entrance examination in 2015.

He was a recipient of the Silver Prizes in the 25th HumanTech Paper Award from Samsung Electronics, Suwon, South Korea, in 2019.

Juhyoung Lee
../../Resources/ieie/JSTS.2020.20.6.499/au4.png

Juhyoung Lee received the B.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea in 2017, and the M.S. degree in electrical engineering from the KAIST in 2019, where he is currently pursuing the Ph. D. degree.

He is a student member of IEEE.

His current research interests include energy-efficient multicore architectures/accelerator ASICs/systems especially focused on artificial intelligence including deep reinforcement learning and computer vision, energy-efficient processing-in-memory accelerator, and deep learning algorithm for efficient processing.

Hoi-Jun Yoo
../../Resources/ieie/JSTS.2020.20.6.499/au5.png

Hoi-Jun Yoo received the bachelor’s degree from the Electronics Depart-ment, Seoul National University, Seoul, South Korea, in 1983, and the M.S. and Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 1985 and 1988, respectively.

He was the VCSEL pioneer at Bell Communications Research, Red Bank, NJ, USA, and the Manager of the DRAM Design Group, Hyundai Electronics Inc., Ichon, South Korea, during the era of 1M DRAM up to 256M SDRAM.

From 2003 to 2005, he served as the full-time Advisor to the Minister of the Korean Ministry of Information and Communication for SoC and next-generation computing.

He is currently an ICT Endowed Chair Professor with the School of Electrical Engineering and the Director of the System Design Innovation and Application Research Center (SDIA), KAIST.

He has published more than 400 articles and wrote or edited five books: DRAM Design (1997, Hongneung), High Performance DRAM (1999 Hongneung), Low Power NoC for High Performance SoC Design (2008, CRC), Mobile 3D Graphics SoC (2010, Wiley), and Biomedical CMOS ICs (Co-editing with Chris Van Hoof, 2010, Springer), and co-written chapters in numerous books.

His current research interests include bio-inspired artificial intelligence (AI) chip design and multicore AI SoC design, including DNN accelerators, wearable healthcare systems, network-on-chip, and high-speed low-power memory.

Dr. Yoo is an Executive Committee Member of the Symposium on VLSI and a Steering Committee Member of the Asian Solid-State Circuits Conference (A-SSCC), of which he was nominated as the Steering Committee Chair from 2020 to 2025.

He received the Order of Service Merit from the Korean Government in 2011 for his contributions to the Korean memory industry, the Scientist/Engineer of the month Award from the Ministry of Education, Science and Technology of Korea in 2010, the Kyung-Am Scholar Award in 2014, the Electronic Industrial Association of Korea Award for his contributions to the DRAM technology in 1994, the Hynix Development Award in 1995, the Korea Semiconductor Industry Association Award in 2002, the Best Research of KAIST Award in 2007, the Excellent Scholar of KAIST Award in 2011, and the Best Scholar of KAIST Award in 2019.

He was a co-recipient of the ASP-DAC Design Award in 2001, the A-SSCC Outstanding Design Awards in 2005, 2006, 2007, 2010, 2011, and 2014, the International Solid-State Circuits Conference (ISSCC)/DAC Student Design Contest Awards in 2007, 2008, 2010, and 2011, the ISSCC Demonstration Session Recognition in 2016, 2017, and 2019, and the Best Paper Award of the IEEE International Conference on Artificial Intelligence Circuits and Systems in 2019.

He was a TPC Chair of the ISSCC 2015, a Plenary Speaker of the ISSCC 2019 entitled Intelligence on Silicon: From Deep Neural Network Accelerators to Brain-Mimicking AI-SoCs, and the Chair of the Technology Direction (TD) Subcommittee of the ISSCC 2013.

He has served as an Executive Committee Member of the ISSCC, the IEEE SSCS Distinguished Lecturer from 2010 to 2011, and the TPC Chair of the International Symposium on Wearable Computers (ISWC) 2010 and the A-SSCC 2008.

He was the Editor-in-Chief (EIC) of the Journal of Semiconductor Technology and Science (JSTS) published by the Korean Institute of Electronics and Information Engineers from 2015 to 2019 and a Guest Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC) and the T-BioCAS.

He is also an Associate Editor of the IEEE JSSC and the IEEE SOLID-STATE CIRCUITS LETTERS (SSCL).

Joo-Young Kim
../../Resources/ieie/JSTS.2020.20.6.499/au6.png

Joo-Young Kim received the B.S., M.S., and Ph. D degree in Electrical Engineering from Korea Advanced Institute of Science and Technology (KAIST), in 2005, 2007, and 2010, respectively.

He is currently an assistant professor in the School of Electrical Engineering at KAIST since September 2019.

His research interests span various aspects of hardware design including VLSI design, computer architecture, FPGA, domain specific accelerators, hardware/software co-design, and agile hardware development.

Before joining KAIST, Joo-Young was a Senior Hardware Engineering Lead at Microsoft Azure working on hardware acceleration for its hyper-scale big data analytics platform named Azure Data Lake.

Before that, he was one of the initial members of Catapult project at Microsoft Research, where he deployed a fabric of FPGAs in datacenters to accelerate critical cloud services such as machine learning, data storage, and networking.

Joo-Young is a recipient of the 2016 IEEE Micro Top Picks Award, the 2014 IEEE Micro Top Picks Award, the 2010 DAC/ISSCC Student Design Contest Award, the 2008 DAC/ISSCC Student Design Contest Award, and the 2006 A-SSCC Student Design Contest Award.

He serves as Associate Editor for the IEEE Transactions on Circuits and Systems I: Regular Papers (2020-2021).