Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 12, No. 03, p.261-268

ISSN (online) :

2287-5255

Received : 23 December 20222 April 2023

DOI :

https://doi.org/10.5573/IEIESPC.2023.12.3.261

Regular Paper

Dynamic Framerate SlowFast Network for Improving Autonomous Driving Performance

JeonByeong-Uk¹ ChungKyungyong²

(Department of Computer Science, Kyonggi University / Suwon, Korea jebuk97@kyonggi.ac.kr)
(Division of AI Computer Science and Engineering, Kyonggi University / Suwon, Korea dragonhci@gmail.com )

^* Corresponding Author: Kyungyong Chung

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

Computer vision technology is used for autonomous driving and road traffic safety. Accordingly, studies on deep learning models that detect and analyze objects through images or videos are ongoing. On the other hand, deep learning algorithms that detect even the action of things require high computing performance. In addition, the computing performance of autonomous driving vehicles for processing such tasks is inferior. These classification processes are not used for recognizing and determining autonomous driving vehicles because it is impossible to process the classification of the action of autonomous driving vehicles in an autonomous driving vehicle on a real-time basis. This paper proposes a Dynamic Framerate SlowFast network for improving autonomous driving performance. Unlike pre-existing studies, the proposed model includes a cropping process through the YOLO model. In addition, it measures the similarity between unit frames through the SSIM and skips the input when the similarity exceeds a certain level. This process made it possible to reduce the number of frames entered into the model. Compared to the existing SlowFast Network, the performance evaluation compared the time required to analyze one image and the AUC of classification results when the number of input frames was reduced through similarity analysis techniques. The similarity analysis technique achieved the highest AUC when the SSIM was applied. The Dynamic Framerate SlowFast network proposed in this study achieved an AUC of 0.7126 and took an FPS of 0.7912 to analyze the entire verification video data. Compared to the pre-existing SlowFast network, which took an FPS of 0.5285 to achieve an AUC of 0.7531, the Dynamic Framerate SlowFast network achieved faster and more accurate results. Therefore, using the proposed technique, it is possible to achieve faster detection results while maintaining the object action detection AUC of the SlowFast network.

Keywords

YOLO, SlowFast network, Action recognition, Autonomous driving

1. Introduction

Recently, the use of computer vision technology for autonomous driving and road traffic safety has increased. Intelligence technology enabling the flexible handling of diverse driving environments is essential for guaranteeing driving performance. Video sensors, such as cameras mounted on vehicles, are required to interpret information for accurate judgment when errors occur, or it is challenging to acquire data. Therefore, studies on deep learning models that detect and analyze objects through images or videos are being conducted ^[1,^2]. Technologies, such as object recognition, detection, and tracking, are being developed through sensor fusion in response to weather changes, the internal/external problems of the vehicle, and the external changes of the recognition target ^[3]. On the other hand, most of the current autonomous driving technologies are responding according to the location of objects using only the results of object detection. This is the cause of the lack of ability to cope with unexpected situations, such as sudden lane changes in front vehicles and awareness of accident vehicles. Therefore, the autonomous driving module can cope in more situations by adding behavior prediction of the object. Nevertheless, recognition and action should be carried out quickly with road transportation. Accordingly, the processing speed of a model is essential so it can provide real-time processing. In the case of deep learning models, there is a limitation in that real-time processing is difficult because of a large amount of computation. The appearance of algorithms, such as R-CNN ^[4] and YOLO ^[5], makes it possible to process the object detection performance in real time. On the other hand, it cannot process such action in real time because an algorithm that detects even the action of objects is slow. Therefore, a model capable of demonstrating improved detection speed while maintaining its action classification AUC is needed to improve autonomous driving performance by predicting the action of objects.

Teawon Han et al. ^[6] proposed 'Driving Intention Recognition and Lane Change Prediction on the Highway'. The proposed framework predicts the lane change intention and lane change action on the highway through the external sensor data. The driving patterns and characteristics are recognized through the LSTM model, and such patterns and characteristics are classified into three classes: LCL(lane change Left), LCR(lane change Right), and LK(Lane-Keeping). NGSIM data are used to achieve this. The NGSIM data ^[7] is the vehicle trajectory data collected through videos shot from tall buildings. The trajectory data of the vehicles to be classified are proposed based on the NGSIM data, and the action classification was performed. Therefore, as the input data was not collected from a sensor mounted on a vehicle being driven, there is a limitation that it cannot be used when there is no external sensor or where communication is not possible.

Therefore, this study proposed the Dynamic Framerate SlowFast network. This study analyzed the similarity between the unit frames of an input image of the similarity between frames. Currently, if there is a similarity above a certain threshold, the frame does not enter into the SlowFast Network and uses the existing output of the model. This reduces the number of image frames input to the model itself, enabling a decrease in computing performance requirements while maintaining the AUC of object behavior classification.

2. Related Work

2.1 Action Recognition Models

Deep learning-based computer vision technology is being studied to classify the action of objects. Algorithms, such as CNN-based R-CNN ^[4] and YOLO ^[5], make it possible to classify the object types at a fast speed. On the other hand, as for the classification of the action of objects, Two-Stream neural networks, such as the SlowFast network ^[8], have been achieving high performance. Fig. 1 presents the structure of a Two-Stream neural network. The two-stream neural network has a high AUC in object behavior classification because it identifies different types of characteristics through the temporary and spatial streams and utilizes them for behavior classification. Nevertheless, it is slow because Two-Stream neural network-based object action classification models use two neural networks simultaneously.

Biparva, M. et al. ^[9] predicted the lane change of vehicles and compared the performance results through a two-stream network. It provides robust prediction of lane changes for nearby vehicles, demonstrating great accuracy in the temporal domain ranging from 1-2 seconds. The video used is a traffic video that records the front situation of a vehicle. The Two-Stream Convolutional Networks, Two-Stream Inflated 3D Convolutional Networks (I3D) ^[10], Spatiotemporal Multiplier Networks ^[11], and SlowFast Networks ^[8] are used for performance evaluation. Based on the model comparison results, the SlowFast network showed the highest AUC compared to the observation horizon. Regarding the SlowFast network, an excessive overhead occurs, and this causes a GPU memory problem. Therefore, the author finally classified the lane change of vehicles through the Spatiotemporal Multiplier Networks. Based on the performance valuation results. They confirmed that the NLC (No Lane Change) achieved excellent classification performance. On the other hand, the LLC (Left Lane Change) and RLC (Right Lane Change) was limited because they showed a precision of 60\textendash{}70\% and a recall of 60\textendash{}70\%. In addition, the data used for testing requires two seconds of observation, even though the image had its vehicle object section cropped in advance. This serves as a factor that slows the processing to be performed by the existing system used for pre-processing, such as object recognition and cropping. Therefore, this study developed a plan to reduce the required model performance while maintaining the AUC of the SlowFast network.

Fig. 1. Structure of Two-Stream neural network.

2.2 Dynamic Framerate

A deep learning model can improve object detection and classification of AUC. The required computing performance improves because of the characteristics of deep learning. In addition, there has been an increase in cases where various deep learning models are used in conjunction for AUC improvement purposes. Therefore, an efficient data processing technique capable of reducing the required performance of a deep learning model is necessary. Videos shot on the road in diverse environments have differences in their dynamic level. Therefore, unnecessary frames are inputted when the same content is repeated continuously. Therefore, the aim is to improve performance by applying the frame skip method and reducing the data entered into the model. Park. J. W. et al. ^[12] proposed a faster object detection using the frame skip method. Whether to input a frame into YOLO or keep the result of the previous frame is determined by temporal subtraction and ORB feature matching between two consecutive frames. First, the temporal subtraction of two adjacent frames was performed. The temporal subtraction mask did not provide the output if the two adjacent frames were identical. In this case, the frame was not entered into the YOLO model. Instead, the output results of the YOLO model drawn from the previous frame were used. The object detection speed was enhanced using such a process, and the AUC was enhanced. The limitation is that it is unsuitable for applying to road traffic data because whether to skip a frame is determined only when the two adjacent frames are identical. As for road traffic data, it is rare to have completely identical frames due to vibration. In addition, the frame will not be skipped when only the illuminance value is changed, as it is with tunnels. Therefore, instead of simply comparing the illuminance values, it was necessary to develop a method that determines how similar the context of the images per frame. Therefore, this study used a measuring technique that focuses more on the structural differences of images than on the change in illuminance values.

Image similarity measuring techniques include techniques using the differences between the pixels and techniques comparing the structural differences of pixels. NRMSE (Normalized Root Mean Square Error) ^[13] calculates the difference at the same location and then normalizes it. On the other hand, it is difficult to calculate the similarity between images accurately because the technique cannot calculate the structure of image pixels. An image similarity measuring technique performed through histogram makes a comparison through color distribution. Nevertheless, the structural characteristics of images or the differences in pixel positions are not considered. SSIM (Structural Similarity Index Measure) ^[14] is an image similarity technique that considers brightness, contrast, and structure to resolve the problem demonstrated by the NRMSE. Therefore, the SSIM prioritizes applying structure to similarity over applying brightness and contrast to similarity.

Fig. 2 shows the similarity scores derived through various image similarity measurement techniques for images that have undergone color tone and brightness change. SSIM, which is highly similar, is suitable for this study because it does not change the image structure but only changes the color tone or brightness.

Fig. 2. Various types of image similarity measuring techniques.

3. Dynamic Framerate SlowFast Network for Improving Autonomous Driving Performance

The Dynamic Framerate SlowFast network proposed in this study is performed in 3 stages. Fig. 3 shows the overview of the dynamic framerate SlowFast network for improving autonomous driving performance.

Fig. 3 shows the process of the proposed model when the similarity is high. The initial stage is the process of collecting and pre-processing road traffic data. The video data of the front situation of the vehicle collected from the dashboard camera mounted on the vehicle or the road information collection system were utilized. The objects were recognized and tracked through the YOLO model, and only the closest object was cropped and entered as data. In the second stage, the similarity between unit frames was analyzed. The input of the involved unit frame was skipped when the similarity between the current and next unit frames exceeded a certain level ^[15]. The SSIM was used as the similarity analysis technique to focus more on the structural changes to images than on the change in surrounding environments. Finally, the frame rate-adjusted image was entered into the SlowFast network. This process can improve the speed while maintaining the AUC of object action detection.

Fig. 3. Overview of Dynamic Framerate SlowFast Network for Improving Autonomous Driving Performance.

3.1 Data Collection and Preprocessing

This study collected the dataset through the PREVENTION (PREdiction of VEhicles iNTentIONs) dataset ^[16]. This dataset is manufactured according to the need to predict the driver's intention and the future trajectory of the vehicle. It included the videos recorded with a camera facing the front as the vehicle is driven and included vehicle trajectory information, category information, and line information as labels. The events include cut-in, cutout, left/right lane change, and risk maneuver. Each video had a resolution of 1920 ${\times}$ 920 and an FPS of 10 frames per second. The dataset consisted of 11 videos recorded on five mutually different days, and the videos were 377 minutes long in total. In this study, only the data related to lane change were used. Table 1 lists the number of videos and the average video length per lane change event included in the dataset.

For road-driving images of the original PREDICTION dataset, it is inappropriate to enter it because it is in the SlowFast Network because there is too much ambient information. Accordingly, it is possible to construct an image dataset in which surrounding information is not excessive by detecting only vehicle objects to classify the behavior through the YOLO model and performing a crop ^[17]. Hence, a lane change image dataset was configured to crop only the vehicle objects for each image frame, resize them to 256${\times}$256 size and input them to the SlowFast Network. The lane change image dataset consisted of cropping only objects moving in the lane using the learning data and the validation data of the bounding box information of the object. The data for learning and verification of SlowFast Network was constructed by cropping the image through the lane change point of the label and the bounding box information of the object. In this case, the cropped image was inverted left and right for use as class data in the opposite situation and solve the data imbalance and shortage. In addition, the vehicle was cropped from the image when the lane change did not occur to derive an image of a general situation not during the lane change. Among the original PREDICTION datasets, the Days 1, 2, and 3 images; Day 4 images; Day 5 images were configured as learning data. The validation data was also composed of the remaining images. After extracting the data, the lane change image dataset consisted of 668 images for each label of the left lane change, right lane change, and no lane change, consisting of 2,004 images. One thousand five hundred images were configured as learning data for the performance evaluation, and the remaining 504 images were configured as validation data. Table 2 lists the configuration of a lane change image dataset for learning and verifying the SlowFast Network.

This study detected objects from driving images and cropped them through YOLOv5 models pre-trained via MS COCO (Microsoft Common Objects in Context) dataset ^[18]. The MS COCO Dataset is an image dataset for object detection, segmentation, key-point detection, and captioning. The YOLOv5 model detects objects by receiving images. The model is very fast compared to existing object detection algorithms. The YOLOv5 model pre-trained through the MS COCO dataset derives performance with a mean average precision (mAP) of approximately 45 or higher due to the performance evaluation on the dataset. Through this, it is possible to detect quickly and accurately only the vehicle objects to classify behavior in the original image through the pre-trained YOLOv5 model. In this case, the Region of Interest (RoI) was adjusted to include some surrounding information.

Table 1. Composition of Lane Change Events Included in Dataset.

Record #	1	2	3	4	5
Left Lane Change	22	36	46	139	170
Right Lane Change	51	48	47	175	178
Mean Frames per Lane Change	40.6078
Mean Time per Lane Change	3.76 seconds

Table 2. Composition of Cropped Video Data.

	Left Lane Change	Right Lane Change	No Lane Change
Cropped	263	405	668
Cropped & Horizontal Flipped	405	263	-
Total	668	668	668
Total	2004

3.2 Similarity Estimation

Regarding the pre-existing SlowFast network, the framerate was fixed. In this study, however, the similarity between frames within each unit frame of the video was analyzed, and the similarity between unit frames was assessed before model input. The following limitation was improved using such as process. The road traffic data has a limitation in that unnecessary frames entered as the same contents are repeated continuously in most cases. When the similarity in both cases exceeds the threshold, the similarity is acknowledged, and the input to the model is skipped. The SSIM (Structured Similarity Image Matching) ^[14] is used for similarity analysis.

(1)

$ SSIM\left(x,y\right)=\frac{\left(2\mu _{x}\mu _{y}+c_{1}\right)\left(2\sigma _{xy}+c_{2}\right)}{\left(\mu _{x}^{2}+\mu _{y}^{2}+c_{1}\right)\left(\sigma _{x}^{2}+\sigma _{y}^{2}+c_{2}\right)} $

The SSIM divides an image into NxN windows and applies Eq. (1) to each window. In Eq. (1), ${\mu}$ represents average and ${\sigma}$ represents dispersion. c is a positive constant that prevents the denominator from being 0. Unlike other image similarity analysis techniques, the SSIM focuses on structural differences. Therefore, it makes it possible for the similarity to focus more on the change in the objects than on the change in external environments. This is suitable for drawing similarities from road driving data containing various illuminance changes resulting from changes in sunlight and tunnel entrances during driving. Fig. 5 shows the similarity between the frames of the first-day video.

In each plot, the x-axis represents the image frame of the comparison point. The y-axis is the similarity score derived by comparing the image of the corresponding frame with the image of the previous frame through a similarity measurement technique. Fig. 5(a) shows the SSIM; Fig. 5(b) shows the NRMSE; Fig. 5(c) shows the similarity scores measured through a histogram. In the case of the NRMSE and histogram, they were so insensitive to image changes that the similarity is constantly high. Therefore, it is appropriate to skip the frame after measuring the similarity through the SSIM.

Fig. 5. Plots that derive the similarity between frames of the first-day video.

3.3 Dynamic Framerate SlowFast Network

Using the Dynamic Framerate SlowFast network proposed in this study, the video of the initial unit frame was entered into the SlowFast network. Starting with the second unit frame, where the possibility of one class exceeds a certain level, the involved class is drawn as an action classification result. The video similarity between the next unit frame video and the previous unit frame video was analyzed when an action classification result was drawn. The involved unit frame was entered into the SlowFast network, and the action of objects was classified when the similarity was below the set threshold. On the other hand, the involved unit frame was skipped and not entered into the SlowFast network when the similarity exceeded the set threshold. At that point, the model output used the action classification results from the previous unit frame. Through such a process, provided that minimum movements existed, the frame was entered into the SlowFast network. Fig. 6 shows a pseudo-code of skipping unit frame process.\begin{enumerate}[1.]

1. Confirming whether the probability of one class exceeds a certain level after action classification

2. Confirming whether the similarity exceeds a certain level after similarity analysis

3. Skipping the frame when 1. and 2. apply

Fig. 6. Process of skipping a unit frame.

An algorithm is entered in units of 10 frames. A UnitFrame variable refers to the unit frame being entered. The SlowFast method receives unit frame inputs, and the CheckSimilarity () classifies the similarity between two unit frames. The similarity threshold is entered into the threshold variable, serving as the hyperparameter. Fig. 7 shows the final output results of the model proposed in this study.

Fig. 7. Final output results of the proposed model.

In the video, information is displayed per object. First, the type of object is displayed and the recognition ranking of the involved object (how quickly the involved object is recognized) is displayed. Second, the distance between the camera and the object is displayed. At that point, a higher number means that the distance is closer. Lastly, the current action taken by the object drawn through the SlowFast network is displayed.

4. Experiments

4.1 Similarity Estimation Method

The software in the experimental environment of the proposed model used Ubuntu 18.04, CUDA version 11.2, Python 3.9.7, and the deep learning framework PyTorch. The hardware consisted of Intel(R) Xeon(R) Silver 4210R CPU @ 2.40 GHz, NVIDIA RTX 3090, and 24 GB RAM. In the case of a dataset, the learning data consisted of images on Days 1, 2, 3, and 4 and the images on Day 5. The validation data was also composed of the remaining images. When the image length was converted into a ratio, the ratio of the learning data and the validation data was approximately 5;1. In the case of lane change image data derived through this, 2,004 images were composed. For performance evaluation, 1,500 images were configured as learning data, and the remaining 504 images were configured as validation data. The performance evaluation used the AUC of the proposed model and compared it with the existing model. The AUC is the area under the ROC (Receiver Operating Characteristic) Curve. It represents the width of the bottom area of the ROC Curve ^[19,^20]. A value closer to 1 indicates that the sensitivity and specificity are close to 1, indicating that the model performance is superior.

Table 3 shows the performance evaluation results per technique used for similarity measurement. The performance evaluation results are acquired by comparing the AUC and the percentage of skipped frames between the case where the similarity analysis technique is applied and the case where the pre-existing models are applied. Based on the AUC comparison results, it was confirmed that the histogram method showed an AUC of 0.5923 and that NRMSE showed an AUC of 0.5782. SSIM used in this study showed an AUC of 0.7126, which is higher than those of histogram and NRMSE. In addition, compared to the histogram and NRMSE, SSIM showed a greater number of skipped frames when the same threshold was applied. Through such a process, it was confirmed that SSIM is the similarity measuring technique that enables skipping more frames while maintaining AUC in the road environment.

Table 3. Performance Evaluation Results per Similarity Measuring Technique.

Similarity Measuring Technique	AUC	Skipped Frames
Histogram	0.5923	87.6%
NRMSE	0.5782	92.1%
SSIM	0.7126	46.6%

4.2 Dynamic Framerate SlowFast Network

The performance was evaluated by comparing the level of speed improvement and the level of AUC performance maintenance when the frames were skipped using the SSIM. Table 4 lists the results acquired by comparing the speed and AUC between the case where the proposed frame skip technique was applied and the case where the proposed frame skip technique was not applied.

In addition, it shows the AUC changes made when the similarity threshold was adjusted. First, based on the AUC comparison results per similarity threshold, the highest AUC was confirmed when the threshold was 0.75. Through this process, 0.75, with a smaller AUC reduction than the speed increase, was the most suitable threshold. In addition, the FPS(Frame Per Second) was drawn to analyze the video analysis speed. Compared to when the pre-existing system was applied when the proposed frame skip technique was applied, the image analysis speed increased approximately 1.5 fold to 0.7912 FPS with Video 5. In addition, the AUC was 0.7126, which was on 0.04 decrease compared to the pre-existing AUC. Through the proposed similarity measuring technique, the frame skip technique can maintain its AUC while reducing its analysis time.

Table 4. Performance Evaluation Results According to Use Status of Frame Skip Technique.

Threshold	AUC	FPS
No Frame Skip	0.7531	0.5285
0.85	0.7017	0.6764
0.65	0.6940	0.8113
0.75 (Ours)	0.7126	0.7912

5. Conclusion

The Dynamic Framerate SlowFast network was proposed for improving autonomous driving performance. The proposed model reduced the number of frames entered into the model by analyzing the association between the unit frames. Through such a process, it was possible to draw the results more promptly while maintaining the object action detection AUC of the SlowFast model. This makes it possible to draw objects and their action information through computer vision in diverse road traffic situations and use such objects and their action information in real time. By considering the action of objects, it is possible to improve the traffic situation recognition. Based on the performance evaluation results, the proposed Dynamic Framerate SlowFast network showed a video processing speed of 0.7912 FPS, which was much faster than the 0.5285 FPS of the pre-existing SlowFast network. In addition, the Dynamic Framerate SlowFast network showed an AUC of 0.7126, which was only a tiny decrease in AUC compared to the 0.7531 from the pre-existing SlowFast network. Through such a process, the model proposed in this study could maintain the AUC while improving the processing speed. Therefore, the proposed model can provide more efficient vehicle object action recognition than the pre-existing model in road traffic situations.

The SlowFast Network model has excellent human action classification performance, and the dynamic framerate method proposed in this study reduces computing performance requirements and maintains predictive performance. However, in the case of the action classification accuracy for vehicles, there was a limit that the absolute value could not be said to be high. Accordingly, it is necessary to apply a different model. In future studies, we plan to explore ways to improve performance by considering model replacement.

ACKNOWLEDGMENTS

This work was supported by Kyonggi University Research Grant 2022.

REFERENCES

H. Yoo and K. Chung, ``Classification of Multi-Frame Human Motion Using CNN-based Skeleton Extraction,'' Intelligent Automation & Soft Computing, Vol. 34, No. 1, pp. 1-13, Apr. 2022.

H. Yoo and K. Chung, "Deep Learning-based Evolutionary Recommendation Model for Heterogeneous Big Data Integration," KSII Transactions on Internet and Information Systems, 14(9), pp. 3730-3744, Sep. 2020.

J. Sang, Z. Wu, P. Guo, H. Hu, H. Xiang, Q. Zhang, and B. Cai, "An improved YOLOv2 for vehicle detection," Sensors, Vol. 18, No. 12, pp. 4272, Dec. 2018.

K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn," in Proc. of IEEE/CVF ICCV, pp. 2961-2969, 2017.

T. H. Wu, T. W. Wang, and Y. Q. Liu, "Real-time vehicle and distance detection based on improved yolo v5 network," in Proc. of 2021 3rd WSAI, pp. 24-28, Jun. 2021.

T. Han, J. Jing, and Ü. Özgüner, "Driving intention recognition and lane change prediction on the highway," in Proc. of 2019 IEEE IV, pp. 957-962, Jun. 2019.

V. Punzo, M. T. Borzacchiello, and B. Ciuffo, ``On the assessment of vehicle trajectory data AUC and application to the Next Generation SIMulation (NGSIM) program data,'' Transportation Research Part C: Emerging Technologies, Vol. 19, No. 6, pp. 1243-1262, Mar. 2011.

C. Feichtenhofer, H. Fan, J. Malik, and K. He, "Slowfast networks for video recognition," in Proc. of IEEE/CVF ICCV. pp. 6202-6211 2019.

M. Biparva, D. Fernández-Llorca, R. I. Gonzalo, and J. K. Tsotsos, ``Video action recognition for lane-change classification and prediction of surrounding vehicles,'' IEEE Transactions on Intelligent Vehicles, Vol. 7, No. 3, pp. 569-578, Apr. 2022.

J. Carreira, and A. Zisserman, ``Quo vadis, action recognition? a new model and the kinetics dataset,'' in Proc. of the IEEE CVPR, pp. 6299-6308, 2017.

C. Feichtenhofer, A. Pinz, and R. P. Wildes, ``Spatiotemporal multiplier networks for video action recognition,'' in Proc of IEEE CVPR. pp. 4768-4777, 2017.

J. W. Park, J. Kim, and H. J. Lee. "Fast Object Detection Using a Frame Skip Method," in Proc. of 2020 IEEE ICCE-Asia, pp. 1-2, 2020.

M. E. P. Reyes, J. Dorta_Palmero, J. L. Diaz, E. Aragon, and A. Taboada-Crispi, ``Computer vision-based estimation of respiration signals,'' in Proc. of Latin American Conference on Biomedical Engineering, vol. 75, pp. 252-261, Oct. 2019.

I. Bakurov, M. Buzzelli, R. Schettini, M. Castelli, and L. Vanneschi, "Structural similarity index (SSIM) revisited: A data-driven approach," Expert Systems with Applications, Vol. 189, No. 116087, pp. 1-19, Mar. 2022.

C. Dewi, R. C. Chen, Y. T. Liu, X. Jiang, and K. D. Hartomo, "Yolo V4 for advanced traffic sign recognition with synthetic training data generated by various GAN," IEEE Access, Vol. 9, pp. 97228-97242, Jul. 2021.

R. Izquierdo, A. Quintanar, I. Parra, D. Fernández-Llorca, and M. A. Sotelo, ``The prevention dataset: a novel benchmark for prediction of vehicles intentions,'' in Proc. of 2019 IEEE ITSC, pp. 3114-3121, Oct. 2019.

N. Darapaneni, D. Singh, S. Chandra, A. R. Paduri, N. Kilhore, S. Chopra, and S. S. Deshmukh, "Activity & emotion detection of recognized kids in CCTV video for day care using SlowFast & CNN," in Proc. of 2021 IEEE AIIoT, pp. 0268-0274, May. 2021.

T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, and C. L. Zitnick, ``Microsoft coco: Common objects in context,'' in Proc. of ECCV, pp. 740-755, Sep. 2014.

F. S. Nahm, "Receiver operating characteristic curve: overview and practical use for clinicians," Korean journal of Anesthesiology, Vol. 75, No. 1, pp. 25-36, Nov. 2022.

B. U. Jeon, K. Chung, "CutPaste-based Anomaly Detection Model using Multi-Scale Feature Extraction in Time Series Streaming Data," KSII Transactions on Internet and Information Systems, Vol. 16, No. 8, pp. 2787-2800, Aug. 2022.

Author

Byeong-Uk Jeon

Byeong-Uk Jeon received his B.S. degree from the Division of Computer Science and Engineering, Kyonggi University, South Korea, in 2021. He is currently in the Master's course at the Department of Computer Science, Kyonggi University, Suwon, South Korea. He has worked as a researcher at the Data Mining Lab., at Kyonggi University. His research interests include data mining, big data, deep learning, machine learning, and computer vision.

Kyungyong Chung

Kyungyong Chung received his B.S., M.S., and Ph.D. degrees in 2000, 2002, and 2005, respectively, from the Department of Computer Information Engineering, Inha University, South Korea. He has worked for the software technology -leading department of the Korea IT Industry Promotion Agency (KIPA). From 2006 to 2016, he was a professor at the School of Computer Information Engineering, Sangji University, South Korea. Since 2017, he has been a professor in the Division of AI Computer Science and Engineering, Kyonggi University, South Korea. He was named in 2017 as a Highly Cited Researcher by Clarivate Analytics. His research interests include data mining, and artificial intelligence.