Mobile QR Code QR CODE

  1. (Department of Artificial Intelligence Convergence, Busan University of Foreign Studies / Busan, Korea skbbq123@naver.com )
  2. (Department of Cyber Security, Duksung Women's University / Seoul, Korea namkyun@duksung.ac.kr )



Android, Object recognition, SSD, YOLO, Visually impaired, Real-time

1. Introduction

Object recognition is the computer recognition of the objects in images captured in real-world scenes. Object recognition is a compound word of classification and localization, meaning the classification of objects and the identification of location information simultaneously. Object recognition technology is widely applied in medical, autonomous driving, and military fields and research on it is being actively conducted. Until recently, visually impaired people have suffered considerable inconvenience in their daily life, and accidents are frequent. Therefore, it is time to find an alternative solution to these problems. This paper proposes a dangerous object recognition application to solve this problem through image recognition using a mobile phone. Joseph Redmon et al. proposed the you only look once (YOLO) algorithm for a one-step method of these object recognition algorithms [1]. Wei Liu et al. studied the Single Shot Multi-Box Detector algorithm (SSD) [2]. Regarding the two-step method, Shaoqing Ren et al. researched the faster region-based convolutional neural network (R-CNN) algorithm [3]. On the other hand, the faster R-CNN method is unsuitable for real-time object detection because it is a two-step method, and its speed is relatively slow. Sanika Dosi et al. applied object recognition to mobile phones [4]. In addition, Sumitra A. Jakhet et al. studied object detection applications for the visually impaired [5]. Object recognition networks are generally divided into a one-step method and a two-step method. In the one-stage method, localization and classification are performed simultaneously, whereas they are performed separately in the two-stage method. The two-step method is more accurate but slower because it takes one more step than the one-step method. This paper proposes a system that recognizes objects in front of the user with a camera and informs them in real-time using a smartphone for the visually impaired. A mobile-based real-time dangerous object recognition system is proposed using an SSD network using a first-stage detector suitable for real-time object recognition among two representative image object recognition methods.

2. Design of Risk Object Recognition System for the Visually Impaired

This paper presents a system for real-time risk object recognition for the visually impaired. For the visually impaired, even objects that are not hazardous to non-disabled people may be at risk. Therefore, this study designed a mobile application that recognizes objects in everyday life, identifies the objects in front of them, and informs the visually impaired. Mobile phones are easy to use and light, which makes them very convenient for users. Using this point, an application for the visually impaired on a mobile phone can help the visually impaired to solve the inconvenience of life. The system proposed in this paper is based on dangerous object recognition.

After training the dataset in advance, it was converted to a pb-file suitable for Android applications. Subsequently, it produces an Android application to which the model is applied. When a specific object is detected [6], it notifies the user of what kind of object is in front of them by sounding an alarm. The highest priority class, i.e., the object, is processed first using the priority queue among the recognized objects. In this structure, the user is notified of the recognized object according to the priority. The user can hear the voice and know the object in front.

Fig. 2 shows the structure of the one-stage detector. As mentioned earlier, unlike the two-stage method in which Regional Proposal and Classification are performed sequentially, it is performed simultaneously in the one-stage method. Therefore, it is relatively fast, which is much more advantageous for real-time object detection [7]. Recently, as the accuracy of the one-stage detector is improved, it has an accuracy close to that of the two-stage detector.

Fig. 1. Overall Object Recognition System.
../../Resources/ieie/IEIESPC.2023.12.2.107/fig1.png
Fig. 2. 1-Stage Detector.
../../Resources/ieie/IEIESPC.2023.12.2.107/fig2.png

3. Object Recognition Network Training

Although various networks exist for object detection, in this paper, a suitable one-stage detector is selected for real-time object detection. This paper considers the representative SSD and YOLO networks among the one-stage detector methods. YOLO and SSD networks have the advantage of high speed, so they are suitable for real-time object detection. On the other hand, the YOLO network has difficulty distinguishing overlapping objects, and the SSD network is more difficult to use than YOLO. In this study, a real-time object recognition system [8] was designed by selecting an SSD network.

3.1 SSD Network

The SSD model is a one-stage detector like YOLO. The entire network uses the pretrained VGG16 as a base. After that, it has a structure to add a secondary network. The secondary network is composed of a general convolutional network connected to the secondary network by changing the fully connected layer to a convolutional network. The detection speed is improved in this process, and the overall network is similar to a general convolutional network.

Fig. 3 presents the rough structure of the SSD network. The main idea of the SSD model is to use feature maps of various scales. The existing model used only feature maps of the same size. When a feature map of a constant scale is used, it may be difficult to detect objects of various sizes. The SSD model is used for detection by extracting feature maps of convolutional layers in the middle.

Fig. 3. SSD Network.
../../Resources/ieie/IEIESPC.2023.12.2.107/fig3.png

3.2 Loss Function

The loss function of the SSD model consists of the sum of the confidence loss and the localization loss. ${\alpha}$ is a parameter that adjusts the weight between the two losses, and ${\alpha}$ = 1 is used. N is the number of matched default boxes. If N=0, the loss becomes 0. The complete formula is shown in Eq. (1).

(1)
$ L\left(x,c,l,g\right)=\frac{1}{N}\left(L_{conf}\left(x,c\right)+\alpha L_{loc}\left(x,l,g\right)\right) $

Like the Faster R-CNN model, the localization loss is obtained through smooth L1 Loss using the central coordinates, width and height of the default box. The expression for it is shown in Eq. (2) below.

(2)
$ L_{loc}\left(x,l,g\right)=\sum _{i\in Pos}^{N}\,\sum _{m\in \left\{cx,cy,w,h\right\}}x_{ij}^{k}smooth_{L1}\left(l_{i}^{m}-\hat{g}_{j}^{m}\right) $

Confidence loss is calculated through softmax loss for all classes. The formula is shown in Eq. (3).

(3)
$ L_{conf}\left(x,c\right)=-\sum _{i\in Pos}^{N}x_{ij}^{p}\log \left(\hat{c}_{i}^{p}\right) \\ -~ \sum _{i\in Neg}\log \left(\hat{c}_{i}^{0}\right) \\ where \\ \hat{c}_{i}^{p}=\frac{exp\left(c_{i}^{p}\right)}{\sum _{p}exp\left(c_{i}^{p}\right)} $

3.3 Training SSD

As shown in Fig. 3, the entire network was constructed, and the pre-trained model was produced. The two fully connected layers were replaced with a convolutional network. Subsequently, the extra network was designed so that the size of the final output became 1x1. Different feature maps were then obtained, and the convolutional operation was applied to the feature maps of different scales. The entire feature map was merged, and the SSD network was trained through the above loss function.

4. Experimental Result

This paper proposes the application of real-time risk object recognition for the visually impaired to Android applications. Therefore, identifying an object that can threaten the visually impaired in daily life is essential. This paper proposes a mobile application by selecting the most easily conceivable objects.

The four items in Table 1 were selected as they were judged to be the most common dangerous objects that visually impaired people can encounter outdoors daily.

Table 1. Target Object Table.

Target Object

1. Person

2. Car

3. Bicycle

4. Traffic light

Table 2. Hardware Specification.

Device

LG V30

Size

151.7 × 75.4 × 7.3mm

weight

158g

Rear camera

16MP

GPS

GPS, GLONASS, GALILEO

Sensor

Fingerprint, acceleration, and proximity

battery

3300mAh

USB port

3.1, Type-C 1.0

4.1 Hardware, Software Configuration

The hardware used in this experiment was an LG V30 smartphone. The specifications of the proposed hardware are as follows. The experiment results may change depending on the camera performance of the hardware.

As shown in the table above, the camera used for object recognition has a specification of 16 megapixels. It is lightweight, even if the visually impaired use it only for object recognition. In addition, there are many possibilities that it can be used in more advanced parts using built-in sensors and GPS. When developing an Android application, it is connected to a computer using a USB port. Application design and creation were carried out through Android Studio. In addition, a function to give an alarm to the user when the highest priority class is a specific object using a priority queue has been added.

Fig. 4 shows the code that executes the sound file only when the class is 'person' among the objects recognized by the smartphone. By varying the sound according to a specific object, it can recognize what kind of object it is by sound.

The dataset used for object recognition can recognize approximately 90 types of objects using the coco dataset. Among them, a dangerous object is selected, and an alarm function is added separately. The system provides sound by prioritizing high-priority objects and can be adjusted easily in Android Studio according to convenience.

Fig. 4. Add Sound Code on Android Studio.
../../Resources/ieie/IEIESPC.2023.12.2.107/fig4.png

4.2 Experimental Result

This paper proposes a notification system through dangerous object recognition for the visually impaired. Four objects were selected as experiment conditions: person, car, bicycle, and traffic light. In the experiment method, the above four objects were photographed randomly, and the recognition rate was calculated based on the total number of recognition and notification success times. The formula for calculating the recognition rate is as follows (4).

(4)
$ Recognition\,\,Rate=\frac{Number\,\,of\,\,Recognized\,\,Objects}{Total\,\,Number\,\,of\,\,Object} $

Because this paper focuses on the visually impaired, eventually converting to sound is also essential. Therefore, the number of recognized objects reflected in the recognition rate in Eq. (4) is the number of times the user hears the correct notification. Real-time object recognition was carried out by shooting based on the designed application.

Fig. 5 above is a picture captured while shooting with a smartphone to which the object recognition system is applied. In addition to filming in real life, additional photographs and videos were used. For all four items, object recognition was tested approximately 50 times. Table 3 lists the results of the recognition rate derived through Eq. (4).

Each of the four types of the experiment was performed 50 times, and the recognition rate was 96% for Person, 96% for Car, 98% for Bicycle, and 94% for Traffic light. On the other hand, these results may vary depending on the degree of light reflection or weather.

Fig. 5. Object Recognition Screen.
../../Resources/ieie/IEIESPC.2023.12.2.107/fig5.png
Table 3. Recognition Rate.

Object

Recognition Rate (%)

Person

96

Car

96

Bicycle

98

Traffic light

94

5. Conclusion

This paper proposed an application for real-time dangerous object detection for the visually impaired. Among the one-stage and two-stage detector methods, a one-stage detector more suitable for real-time object detection was selected, and a model trained through an SSD network was used for real-time object detection. Even a trivial object for the visually impaired can be considered a dangerous object in life. Therefore, this paper proposes a system that detects objects by selecting four objects: people, cars, bicycles, and traffic lights. The system notifies the visually impaired person with a sound when the camera detects the object. After applying the learned model to the application, experiments were performed approximately 50 times for each item. The experiment was for the visually impaired, so it was filmed using a camera in everyday life. The recognition rate for the experiment was calculated as the number of times the application made a sound after object detection. The experiment results were 96% for people, 96% for cars, 98% for bicycles, and 94% for traffic lights. In view of these results, the proposed system can guarantee a safer life for the visually impaired who have difficulty in life and can further improve the quality of life. In the future, more in-depth studies are needed to detect objects in environments with poor shooting conditions, such as light reflection and weather. If further research is conducted, a system useful for non-disabled people will come out when it is necessary to secure a dark space or a field of vision.

ACKNOWLEDGMENTS

This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ICAN(ICT Challenge and Advanced Network of HRD) program(IITP-2022-2020-0-01825) supervised by the IITP(Institute of Information & Communications Technology Planning & Evaluation). This research was partly supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea Government(MSIT) and Korea Institute for Advancement of Technology(KIAT) grant funded by the Korea Government(MOTIE) (P0008703, The Competency Development Program for Industry Specialist).

REFERENCES

1 
J. Redmon, et al., "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 779-788, 2016.DOI
2 
Liu, W., et al., ``SSD: Single Shot MultiBox Detector,'' Computer Vision \textendash{} ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9905. Springer, Cham.DOI
3 
Shaoqing Ren, et al., ``Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks'' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017.DOI
4 
Sanika Dosi et al., ``Android Application for Object Recognition using Neural Networks for the Visually Impaired,'' 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, pp. 1-6, 2018.URL
5 
Sumitra A. Jakhet et al., ``Object Recognition App for Visually Impaired'' 2019 IEEE Pune Section International Conference (PuneCon), Pune, India, pp. 1-4, 2019.URL
6 
William Tarimo et al., ``Real-Time Deep Learning-Based Object Detection Framework,'' 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia, pp. 1829-1836, 2020.URL
7 
P.Devaki et al., ``Real-Time Object Detection using Deep Learning and Open CV,'' International Journal of Innovative Technology and Exploring Engineering (IJITEE)ISSN: 2278-3075, Volume-8 Issue-12S, October 2019.DOI
8 
Ignacio Martinez-Alpiste et al., ``Smartphone-based real-time object recognition architecture for portable and constrained systems'' J Real-Time Image Proc19, 103-115, 2022.URL

Author

Seung Bin Kim
../../Resources/ieie/IEIESPC.2023.12.2.107/au1.png

Seung Bin Kim received his bachelor's degree from Busan University of Foreign Studies, Republic of Korea, in 2021. Since 2021, he has been pursuing his master's program at the Department of Artificial Intelligence Convergence, Busan University of Foreign Studies. He has a passionate interest in Artificial Intelligence Convergence, Image Recognition, Deep Learning, and IoT.

Rodi Hartono
../../Resources/ieie/IEIESPC.2023.12.2.107/au2.png

Rodi Hartono is an electrical engineerwith a strong interest and ability in Artificial Intelligence Convergence, Control Systems, Robotics, Image Recognition, Deep Learning, and Automation Systems. He has worked in the robotics and intelligent systems field for 12+ years as a lecturer, full-time researcher, and leader responsible for managing and supervising a UNIKOM robotics team to research and build robots to compute regions and national and international robot competitions. In addition, his team is also responsible for designing, researching, building, and making robotics products for the industrial community in the robotics division laboratory, Indonesian Computer University (UNIKOM). He received his bachelor's degree from Indonesian Computer University, Indonesia, in 2010. In 2014, he received his Master's degree holder at the School of Electrical and Informatics Engineering, Bandung Institute of Technology (ITB), Indonesia. Since 2021, he has been pursuing his Ph.D. program at the Department of Artificial Intelligence Convergence, Busan University of Foreign Studies, Republic of Korea.

Tshibang Patrick a Kalend
../../Resources/ieie/IEIESPC.2023.12.2.107/au3.png

Tshibang Patrick a Kalend, received his bachelor’s degree in Computer Science Engineering, specifically in Information System Engineering, from Université Protestante de Lubumbashi , Democratic Republic of Congo, in 2016. After his graduation, he provided lectures and led students in their projects for graduation. In 2022, he started his master’s program at the Department of Artificial Intelligence Convergence, Busan University of Foreign Studies. He is passionate about Artificial Intelligence, Computer vision, robotics, and IoT.

Nam Kyun Baik
../../Resources/ieie/IEIESPC.2023.12.2.107/au4.png

Nam Kyun Baik is a Professor in Department of Cyber Security at Duksung Women’s University. He received his B.S., M.S., and Ph.D. degrees in the School of Electronic Engineering from Soongsil University. From 2000–2017, he was a senior researcher at Korea Internet & Security Agency. He has a passionate interest in Convergence Security. Security Consulting, Information Security Management Systems, AI Security, and IoT Security.

Kyoo Jae Shin
../../Resources/ieie/IEIESPC.2023.12.2.107/au5.png

Kyoo Jae Shin is a Professor of Intelligence Robot Science at the Busan University of Foreign Studies (BUFS), Busan and South Korea. He is the director of Future Creative Science Research Institute at the BUFS. He received his B.S. degree in Electronics Engineering in 1985 and M.S degree in Electrical Engineering from Cheonbuk National University (CNU) in 1988 and his Ph.D. degree in the Electrical Science from the Pusan National University (PNU) in 2009. Dr. Shin was a professor of Navy technical education school and a main director for research associate of Dynamic stabilization system in the Dusan defense weapon research institute. He researched and developed the following: fish robot, submarine robot, automatic dug spay robot in a glass room, milking automatic robot using manipulator, personal electrical vehicle, smart accumulated aquarium using heat pump, solar tracking system, 3D hologram system and gun/turret stabilization system. He has interested in intelligence robots, image signal processing application system, and smart farm and aquarium using new energy and IoT technology.