ChoiJiwoong1
ChunDayoung1
LeeHyuk-Jae1
KimHyun2
-
(Inter-university Semiconductor Research Center (ISRC), Department of Electrical and
Computer Engineering, Seoul National University / Seoul 08826, Korea)
-
(Research Center for Electrical and Information Technology, Department of Electrical
and Information Engineering, Seoul National University of Science and Technology /
Seoul 01811, Korea hyunkim@seoultech.ac.kr)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Object detection, Embedded system, Deep learning, Autonomous driving, NVIDIA Jetson AGX Xavier
1. Introduction
Recently, deep neural network (DNN)-based object detection [1] with camera sensors has shown better detection accuracy than humans, significantly
increasing its importance in the object detection part of autonomous vehicles [2,3]. For autonomous vehicles, the real-time detection speed of object detectors is essential
for reducing latency while maintaining a high detection accuracy so that the control
system can respond quickly [4]. In addition, reducing power consumption is also essential for autonomous vehicles
that operate on battery-generated power. Therefore, autonomous vehicles typically
operate based on embedded systems, making it difficult to detect objects in real-time
using limited hardware resources, even with a relatively fast and highly efficient
DNN-based one-stage detector [5].
To overcome this limitation, lightweight DNN-based object detectors that support a
real-time detection speed in embedded platforms and corresponding lightweight and
low-power implementation techniques have been proposed in various previous studies
[5-12]. These algorithms have focused on improving the detection speed significantly by
reducing the computing cost, thereby allowing DNN algorithms to be used in embedded
platforms. On the other hand, there is a problem of accuracy loss compared to the
conventional object detection algorithms.
Given that improved accuracy is essential for the practical deployment of these lightweight
algorithms in autonomous driving, various techniques have been actively studied to
enhance the accuracy of lightweight networks [13-15]. Choi $\textit{et al.}$ [13] proposed a model for predicting the localization uncertainty in a lightweight network
and used the predicted uncertainty in post-processing to improve the accuracy significantly.
On the other hand, the increased computing cost for post-processing leads to a decrease
in the overall detection speed of the model. Yi $\textit{et al.}$ [14] and Dong $\textit{et al.}$ [15] enhanced the accuracy by constructing an additional layer in lightweight networks.
Unfortunately, these methods also increased the computing cost and decreased the detection
speed.
To enhance the detection speed in embedded platforms, this study proposes a parallel
processing scheme for CNN operations in a GPU and Non-maximum Suppression (NMS) operations
in a CPU, thereby hiding the NMS-processing time in the GPU-processing time while
maintaining accuracy. Generally, one-stage object detectors, used widely for autonomous
driving, process the input images as square images in the training and inference steps
[16]. A preprocessing step is required to convert the inputs to square images, given that
all camera sensors employed in recent autonomous-driving applications use a wide-angle
camera [17]. Consequently, this conversion damages the original input image. Although CNNs can
be well trained to recognize objects in distorted ($\textit{i.e.}$, square) images
[16], the accuracy is significantly degraded for the input of the same ratio as the original
image ($\textit{i.e.}$, wide-angle). To address these problems, this study proposes
a new data augmentation technique that considers multiple images and various image
ratios in the training step, thereby enabling the model to cope with various input
sizes and ratios robustly without the penalty of a detection speed during the inference
phase. Furthermore, in the inference phase, the input image is resized to the ratio
of the autonomous driving ($\textit{i.e.}$, wide-angle) camera, thus improving detection
speed and further increasing the accuracy in autonomous-driving embedded systems.
By applying all these proposed methods, the mean average precision (mAP) is improved
by 1.14 percent points (pp) in the Berkeley deep drive (BDD) dataset and 1.34 pp in
the KITTI dataset. The detection speed is also improved by 22.54 % in the BDD and
24.67 % in the KITTI compared to the baseline algorithm, enabling faster and more
accurate detection.
2. Proposed Acceleration Methods
2.1 Non-maximum Suppression Hiding
To enhance the detection speed of object detectors in the embedded platforms, this
study proposes a parallel processing technique for the convolution operations on the
GPU and NMS operations on the CPU, thereby hiding the NMS-processing time into the
convolution-processing time. Figs. 1(a) and (b) show the detection process of the conventional algorithm and the process after applying
the proposed technique, respectively. As shown in Fig. 1(a), the baseline algorithm does not process the next input image until completing all
operations of a currently input image. In autonomous-driving applications, where the
images are input through streaming, the processing structure of conventional algorithms
is inefficient in terms of hardware utilization. Accordingly, the proposed method
employs a pipeline structure to address this issue. As shown in Fig. 1(b), using multi-thread processing, the NMS calculation process of the T frame is hidden
by the GPU calculation process of the T+1 frame. Thus, the previous GPU operation
result of the T frame stored in the buffer is post-processed simultaneously by the
CPU when performing convolution operations of the T+1 frame by the GPU. The processing
time of each task is synchronized so that the CPU can process it at the right timing
according to the GPU operation. This is possible because there is no data dependency
between the GPU inference calculation of the T+1 frame and the CPU post-processing
task of the T frame. Therefore, the proposed method is highly efficient in terms of
hardware utilization and enables a continuous flow of input images to be efficiently
processed.
Fig. 1. Examples of the detection process for the conventional algorithm and after applying the proposed technique.
2.2 Proposed Data Augmentation in Training Step
Among previous augmentation studies, representative Mixup [18] and RICAP [19] enable a one-stage detector to learn various features based on a square input image,
but there is a problem in that the detection accuracy is decreased for non-square
image ratios. In addition, fully convolutional network (FCN), which is a typical network
structure employed in recent object detectors, determines the amount of computation
of the entire network according to the input image size. Therefore, the total computational
cost of the FCN is sensitive to the size of the input image; the computational cost
of the network decreases with decreasing size of the input image, increasing the detection
speed. On the other hand, this leads to a decrease in accuracy. To address this problem,
this subsection proposes a new data augmentation technique for autonomous driving
embedded systems that enhance detection accuracy without compromising the detection
speed.
Fig. 2 shows the proposed data augmentation method using a single image. The example on
the left side in Fig. 2 shows data augmentation maintaining the ratio of the original image. In the first
step, the original image is cropped randomly. This cropped image is inserted into
a square training plate while maintaining its ratio. This process prevents the shapes
of the objects in the original image from becoming distorted. In the final step, conventional
data augmentation schemes, such as flip, saturation, hue, and exposure changes, are
applied [20,21]. The example on the right side in Fig. 2 shows data augmentation without maintaining the ratio of the original image. In the
first step, the image is cropped randomly, resized into a square, and inserted into
a square training plate. In terms of maintaining the original ratio, an affine transform
effect for the objects can be obtained using this method. In the final process, the
aforementioned conventional augmentation techniques are also applied to produce the
final training image. These two augmentation techniques are used during the training
phase to enable the CNN to learn various objects, ratios, and features.
Fig. 2. Example of the proposed data augmentation using single image.
Moreover, the use of multiple images rather than a single image can greatly increase
the diversity of the data and prevent the overfitting of the CNN with deep layers
[19]. Fig. 3 shows the proposed data augmentation technique using two images. In Fig. 3, the augmentation processes that maintain and do not maintain the original ratio
are the same as in Fig. 2. When two images are used, the square training plate area is divided into two. Each
preprocessed input image ($\textit{i.e.}$, image with or without maintaining ratio)
is inserted independently into the divided area. The area of the training plate is
divided based on the width because the input ratio of the autonomous driving camera
is wide-angle. Thus, four training images are generated using two input images, allowing
the training data to be expanded. The left side in Fig. 4 shows an example of forming a training plate using two images. The boundary line
on the left side in Fig. 4 is not fixed at the center; it can move in the movable direction ($\textit{i.e.}$,
up and down), further increasing the number of cases of training data.
Fig. 3. Example of the proposed data augmentation using two images.
Fig. 4. Example of partitioning a training plate according to various images.
A training image is produced using four images to expand the diversity of data beyond
two images. As shown in the right example in Fig. 4, images with or without the maintained ratio are inserted in each of the four areas
partitioned according to the boundary position and line, thus producing the new training
data. The boundary position can move within a specific range in the movable direction
($\textit{i.e.}$, left, right, up, and down), producing more diverse data. Using this
proposed augmentation, new global features, which refer to the formation of a new
image by combining multiple patches from multiple images [19], can be produced to prevent overfitting. Thus, the accuracy is improved significantly.
2.3 Proposed Image Resize in Inference Step
This subsection proposes a preprocessing scheme that enhances the detection speed
by eliminating unnecessary operations in object detectors for an autonomous-driving
embedded system. In actual autonomous driving, in general, Full HD ($\textit{i.e.}$,
1920${\times}$1080) is used as the input resolution [22]. However, because this Full HD size causes serious power consumption and speed degradation,
it is common to automatically resize the high-resolution image and process it at a
lower resolution. Fig. 5 shows the difference in image resizing with the conventional object detectors and
after applying the proposed technique in the inference phase. While most previous
methods [16] resize the image with a square ratio ($\textit{i.e.}$, Conventional in Fig. 5), YOLO-based object detectors [6,13] maintain the actual input ratio when resizing to enhance the accuracy. However, as
shown in the Baseline image in Fig. 5, YOLO-based object detectors also execute operations on the square size input by
filling in certain pixel values in the margin area in the image ($\textit{i.e.}$,
letterbox). In other words, a square input that maintains the input image ratio is
generated and processed in the network. Although this provides the advantage of maintaining
the input ratio for high accuracy, there is no benefit of reducing the computational
cost because the total input still has the same image size of the square ratio.
Fig. 5. Examples of the image resize method in the inference phase on the conventional object detector, baseline algorithm, and proposed scheme.
The proposed inference preprocessing technique can compensate for the inefficient
structure by removing unnecessary upper and lower letterbox areas from the baseline
method, thereby computing convolution operations only on meaningful pixels and greatly
reducing the computational cost. Furthermore, as the algorithm using the proposed
data augmentation technique robustly detects diverse input ratios by resizing the
image while maintaining the original ratio in the inference phase, the computational
cost is reduced while significantly enhancing the detection accuracy. In other words,
using the effect of reduced computation, the image can be notably resized again, considering
the trade-off between the computing cost and detection speed, thereby improving accuracy.
3. Experimental Results
3.1 Experimental Environment
The superiority of the proposed methods is assessed by experiments using BDD [23] and KITTI [24] datasets, which are widely used in autonomous-driving research. The same datasets,
baseline algorithm ($\textit{i.e.}$, tiny Gaussian YOLOv3), open-source, and experimental
settings used in [13] are used for a fair comparison.
3.2 Accuracy Evaluation
Table 1 lists the mAP and floating-point operations per second (FLOPs) results of the baseline
algorithm [13] and the algorithms applying the proposed methods for the BDD and KITTI datasets.
The input resolution is set to 512${\times}$512 as in [13] to compare the accuracy fairly, excluding the proposed image resize method. It is
noteworthy that in BDD, the number of classes is larger than that of KITTI, and the
data diversity is also higher. In addition, because BDD applies the strict IoU threshold
($\textit{i.e.}$, IoU>0.75) for all classes in the evaluation, it shows a lower mAP
than KITTI [24]. When applying the proposed data augmentation process (+Proposed augmentation in
Table 1), the mAP is improved by 0.98 pp and 0.96 pp compared to the baseline algorithm for
the BDD and KITTI datasets, respectively. It should be noted that as the proposed
data augmentation scheme is applied only to the training phase, the computing cost
during the inference phase is not increased. Finally, by applying the proposed resize
technique maintaining the original image ratio rather than resizing to a square (+Proposed
resize in Table 1), the mAP is improved greatly by 1.14 pp for the BDD and 1.34 pp for the KITTI dataset
with respect to the baseline. It is noteworthy that applying the proposed resize technique,
which removes unnecessary letterbox operations, causes the input image to become smaller
than the square size ($\textit{i.e.}$, 512${\times}$512). Accordingly, considering
the trade-off between the accuracy and computing cost, it can be scaled up again while
maintaining the original image ratio ($\textit{i.e.}$, 672${\times}$384 in BDD and
768${\times}$256 in KITTI. The reason for these values is described in Section 3.3),
thereby simultaneously improving the accuracy and reducing the computing cost ($\textit{i.e.}$,
FLOPs).
Table 1. Accuracy and FLOPs comparison.
Method
|
mAP (%)
|
Diff.
|
FLOPs (×109)
|
Input size
|
BDD test set
|
Baseline [13]
|
8.56
|
|
8.27
|
512×512
|
+ Proposed augmentation
|
9.54
|
+0.98
|
8.27
|
512×512
|
+ Proposed resize
|
9.70
|
+1.14
|
8.14
|
672×384
|
KITTI validation set
|
Baseline [13]
|
68.69
|
|
8.26
|
512×512
|
+ Proposed augmentation
|
69.65
|
+0.96
|
8.26
|
512×512
|
+ Proposed resize
|
70.03
|
+1.34
|
6.19
|
768×256
|
Table 2 lists the accuracy of the existing data augmentation studies [18,19] and the proposed techniques. When making a square image size, Mixup [18], RICAP [19], and the proposed method resize the image into a square using the conventional method
shown in Fig. 5 in the inference phase. In contrast, when maintaining the original ratio, they resize
the image using the proposed scheme shown in Fig. 5. For square images, all previous techniques, Mixup [18] and RICAP [19], improve the accuracy compared to the baseline for the BDD and KITTI datasets; however,
for input images that maintain the original ratio, their accuracy is reduced greatly.
For the KITTI, which comprises wider images than FULL HD, the previous techniques
significantly degrade the accuracy compared to the baseline. Even in this case, the
proposed method improves the mAP of 0.25 pp for the BDD and 1.85 pp for the KITTI
dataset compared to the baseline. In other words, the proposed augmentation method
improves the accuracy for both square and original ratio input images. When comparing
the accuracy of Mixup [18] and RICAP [19] for square images with that of the proposed method which maintains the original ratio,
the proposed method also shows better result. Nevertheless, the FLOPs of the algorithm
applying the proposed method ($\textit{i.e.}$, 8.14${\times}$10$^{9}$ on BDD and 6.19${\times}$10$^{9}$
on KITTI) are smaller than the FLOPs of Mixup [18] and RICAP [19] ($\textit{i.e.}$, 8.27${\times}$10$^{9}$ on BDD and 8.26${\times}$10$^{9}$ on KITTI),
which use the square image size.
Table 2. Accuracy comparison with previous augmentation studies according to image resize.
Method
|
mAP (%)
|
Input size (Square)
|
mAP (%)
|
Input size (Resize)
|
BDD test set
|
Baseline [13]
|
8.56
|
512×512
|
9.45
|
672×384
|
+ Mixup [18]
|
8.87
|
512×512
|
7.79
|
672×384
|
+ RICAP [19]
|
9.59
|
512×512
|
8.78
|
672×384
|
+ Proposed Aug.
|
9.54
|
512×512
|
9.70
|
672×384
|
KITTI validation set
|
Baseline [13]
|
68.69
|
512×512
|
68.18
|
768×256
|
+ Mixup [18]
|
69.59
|
512×512
|
38.88
|
768×256
|
+ RICAP [19]
|
69.66
|
512×512
|
52.44
|
768×256
|
+ Proposed Aug.
|
69.65
|
512×512
|
70.03
|
768×256
|
3.3 Detection Speed Evaluation
Table 3 shows the inference, post-processing, and total processing times of the baseline
($\textit{i.e.}$, tiny Gaussian YOLOv3 [13]) and the algorithm applying proposed schemes in NVIDIA Jetson AGX Xavier [25] for the BDD and KITTI datasets. The image size for the proposed resize technique
is set to match the FLOPs as closely as possible. Hence, it is set to 672${\times}$384
for the BDD and 768${\times}$256 for the KITTI dataset, while the remaining method
is set to 512${\times}$512. Specifically, the size of the original BDD image is 1280${\times}$720,
and that of the original KITTI image is 1242${\times}$375. Therefore, the horizontal:vertical
ratio is 1.78:1 and 3.3:1, respectively. To match the number of pixels ($\textit{i.e.}$,
262,144 pixels) of the baseline square image ($\textit{i.e.}$, 512${\times}$512),
the image size for the proposed resize technique is set to 672${\times}$384 in BDD
and 768${\times}$256 in KITTI, while matching the ratio of the original image.
The proposed technique shows greater improvements in performance in default mode,
which is typically used in NVIDIA Jetson AGX Xavier. The proposed NMS hiding (+Prop.
NMS hiding in Table 3) reduces the total processing time by 7.64 ms for the BDD dataset compared to the
baseline. Furthermore, the proposed preprocessing technique (+Prop. Resize in Table 3) reduces the processing time further by 7.9 ms (22.54 %). For the KITTI dataset,
applying all proposed techniques reduces the total processing time by 6.57 ms (24.67
%) with respect to the baseline. Finally, applying the proposed schemes to the baseline
algorithm improves the accuracy while achieving a detection speed of 36.83 fps (=1000
ms/27.15 ms) for the BDD dataset and 49.85 fps (=1000 ms/20.06 ms) for the KITTI dataset,
thereby enabling real-time detection to support faster autonomous driving than the
baseline algorithm.
Table 3. Results of processing time required in default mode of NVIDIA Jetson AGX Xavier.
Time (ms)
|
Inference
(GPU)
|
Post processing (CPU)
|
Total
|
Confidence calculation
|
NMS
|
BDD test set
|
Baseline [13]
|
27.05
|
0.36
|
7.64
|
35.05
|
+ Prop. NMS hiding
|
27.05
|
0.36
|
-
|
27.41
|
+ Prop. Resize
|
26.81
|
0.34
|
-
|
27.15
|
KITTI validation set
|
Baseline [13]
|
26.31
|
0.16
|
0.16
|
26.63
|
+ Prop. NMS hiding
|
26.31
|
0.16
|
-
|
26.47
|
+ Prop. Resize
|
19.91
|
0.15
|
-
|
20.06
|
4. Conclusion
This study has proposed a method that enhances the detection speed of an object detector
while significantly improving the accuracy in NVIDIA Jetson AGX Xavier, an embedded
platform for autonomous driving. To improve the detection speed, this study has proposed
a parallel processing scheme for the convolution and post-processing operations. This
study has also proposed new data augmentation and image resize techniques. Applying
the proposed methods to the baseline achieves outstanding performance gains in accuracy
and detection speed, enabling accurate and real-time detection for autonomous driving
embedded platforms.
ACKNOWLEDGMENTS
This work was supported in part by Institute of Information & communications Technology
Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2020-0-01304,
Development of Self-learnable Mobile Recursive Neural Network Processor Technology)
and in part by the Basic Science Research Program through the National Research Foundation
of Korea (NRF) funded by the Ministry of Education under Grant NRF-2019R1A6A1A03032119.
REFERENCES
Choi J., Elezi I., Lee H-J., Farabet C., Alvarez J. M., Oct. 2021, Active Learning
for Deep Object Detection via Probabilistic Modeling, in Proc. IEEE Int. Conf. Comput.
Vis. (ICCV), pp. 10264-10273.
Ravindran R., et al. , Mar. 2021, Multi-Object Detection and Tracking, Based on DNN,
for Autonomous Vehicles: A Review, in IEEE Sens. J., Vol. 21, No. 5, pp. 5668-5677
Zhao X., et al. , May 2020, Fusion of 3D LIDAR and Camera Data for Object Detection
in Autonomous Vehicle Applications, in IEEE Sens. J., Vol. 20, No. 9, pp. 4901-4913
Choi J., Chun D., Kim H., Lee H-J., Oct. 2019, Gaussian yolov3: An accurate and fast
object detector using localization uncertainty for autonomous driving, in Proc. IEEE
Int. Conf. Comput. Vis. (ICCV)
Womg A., Shafiee M. J., Li F., Chwyl B., May 2018, Tiny SSD: A Tiny Single-Shot Detection
Deep Convolutional Neural Network for Real-Time Embedded Object Detection, in Proc.
15th Conf. on Comput. Robot Vision (CRV), pp. 95-101.
Redmon J., Farhadi A., 2018., YOLOv3: An incremental improvement, arXiv preprint,
arXiv:1804.02767
Nguyen D. T., Nguyen T. N., Kim H., H.-J Lee. , 2019, A High-Throughput and Power-Efficient
FPGA Implementation of YOLO CNN for Object Detection, IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., Vol. 27, No. 8, pp. 1861-1873
Nguyen D. T., Hung N. H., Kim H., Lee H. J., 2020, An Approximate Memory Architecture
for Energy Saving in Deep Learning Applications, IEEE Trans. Circuits Syst. I, Reg.
Papers, Vol. 67, No. 5, pp. 1588-1601
Nguyen D. T., Kim H., Lee H. J., Chang I. J., May. 2018, An approximate memory architecture
for a reduction of refresh power consumption in deep learning applications, in Proc.
IEEE Int. Symp. Circuits Syst. (ISCAS), pp. 1-5
Kang D., Kang D., Kang J., Yoo S., Ha S., Mar. 2018, Joint optimization of speed,
accuracy, and energy for embedded image recognition systems, in Proc. 2018 Design,
Automation & Test in Europe Conference & Exhibition (DATE), pp. 715-720
Sandler M., Howard A., Zhu M., Zhmoginov A., Chen L., Jun. 2018, MobileNetV2: Inverted
Residuals and Linear Bottlenecks, in Proc. 2018 IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR), pp. 4510-4520
Nguyen X. T., Nguyen T. N., H.-J Lee , Kim H., Dec. 2020, An Accurate Weight Binarization
Scheme for CNN Object Detectors with Two Scaling Factors, IEIE Transactions on Smart
Processing & Computing, Vol. 9, No. 6, pp. 497-503
Choi J., Chun D., Lee H-J., Kim H., Aug. 2020, Uncertainty-based Object Detector for
Autonomous Driving Embedded Platforms, in Proc. IEEE Int. Conf. Artifici. Intell.
Circuits Syst. (AICAS), pp. 16-20
Zhang Y., Shen Y., Zhang J., Apr. 2019, An improved tiny-yolov3 pedestrian detection
algorithm, Int. J. Light Electron Opt., Vol. 183, pp. 17-23
Xiao D., et al. , Jul. 2019., A target detection model based on improved tiny-yolov3
under the environment of mining truck, IEEE Access, Vol. 7
Zhao Q., et al. , Jan. 2019, M2Det: A single-shot object detector based on multi-level
feature pyramid network, in Proc. AAAI Conf. Artif. Intell. (AAAI), pp. 9259-9266
SEKONIX Corp. , Feb. 2020., SF332X-10X Family Preliminary Datasheet, [Online]. Available:
http://sekolab.com/products/camera/
H. Zhang , et al. , Apr. 2018, mixup: Beyond empirical risk minimization, in Proc.
Int. Conf. Learn. Represent. (ICLR), pp. 1-13
Takahashi R., Matsubara T., Uehara K., 2020, Data augmentation using random image
cropping and patching for deep CNNs, IEEE Trans.Circuits Syst. Video Technol., Vol.
30, No. 9, pp. 2917-2931
Krizhevsky A., Sutskever I., Hinton G. E., Dec. 2012, ImageNet classification with
deep convolutional neural networks, in Proc. Adv. Neural Inf. Process. Syst., pp.
1097-1105
He K., Zhang X., Ren S., Sun J., Jun. 2016, Deep residual learning for image recognition,
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 770-778
Hemmati M., B-Abhari M., Niar S., 2019, Adaptive Vehicle Detection for Real-time Autonomous
Driving System, in Proc. Des. Autom. And Test in Eur.Conf. & Exhib.
Yu F., et al. , Jun. 2020, BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask
Learning, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
Geiger A., Lenz P., Urtasun R., Jun. 2012, Are we ready for autonomous driving? the
kitti vision benchmark suite, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
pp. 3354-3361
NVIDIA Corp. , Dec. 17, 2018, NVIDIA Xavier Documentation
Author
Jiwoong Choi received his B.S. degree in electrical and electronics engineering
from Chung-ang University, Seoul, South Korea, in 2015, and M.S. and Ph.D. degrees
in electrical and computer engineering from Seoul National University, Seoul, South
Korea, in 2017 and 2021, respectively. He is currently a Deep Learning Research Engineer
at NVIDIA, Santa Clara, CA, USA.
Dayoung Chun received her B.S. degree in Electronics Engineering from Sogang University,
Seoul, Korea, in 2018. She is working toward Integrated M.S. and Ph.D. degree in Electrical
and Computer Engineering at Seoul National University, Seoul. Her research interests
include the algorithms and architectures of deep learning, and GPU architecture for
computer vision.
Hyuk-Jae Lee received his B.S. and M.S. degrees in electronics engi-neering from
Seoul National University, South Korea, in 1987 and 1989, respectively, and Ph.D.
degree in Electrical and Computer Engi-neering from Purdue University, West Lafayette,
IN, in 1996. From 1998 to 2001, he was with the Server and Workstation Chipset Division,
Intel Corporation, Hillsboro, OR, as a Senior Component Design Engineer. From 1996
to 1998, he was with the Faculty of the Department of Computer Science, Louisiana
Tech University, Ruston, LS. In 2001, he joined the School of Electrical Engineering
and Computer Science, Seoul National University, South Korea, where he is currently
a Professor. He is a Founder of Mamurian Design, Inc., a fabless SoC design house
for multimedia applications. His research interests are in the areas of computer architecture
and SoC design for multimedia applications.
Hyun Kim received his B.S., M.S. and Ph.D. degrees in Electrical Engi-neering and
Computer Science from Seoul National University, Seoul, Korea, in 2009, 2011 and 2015,
respectively. From 2015 to 2018, he was with the BK21 Creative Research Engineer Development
for IT, Seoul National University, Seoul, Korea, as a BK Assistant Professor. In 2018,
he joined the Department of Electrical and Information Engineering, Seoul National
University of Science and Technology, Seoul, Korea, where he is currently working
as an Assistant Professor. His research interests are the areas of algorithms, computer
architecture, memory, and SoC design for low-complexity multimedia applications and
deep neural networks.