Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 13, No. 02, p.140-147

ISSN (online) :

2287-5255

Received : 14 June 2023Revised : 14 July 2023Accepted : 22 August 2023

DOI :

https://doi.org/10.5573/IEIESPC.2024.13.2.140

Regular Paper

R-CNN Auto-system for Detecting Text Road Signs in Baghdad

AliOmar M. S.¹ Al-ZukyAli A. D.¹ Al-ObaidiFatin E. M.¹

(Department of Physics, College of Science, Mustansiriyah University, Baghdad, Iraq {omar_m_sultan, prof.alialzuky, sci.phy.fam}@uomustansiriyah.edu.iq )

^* Corresponding Author: Fatin E. M. Al-Obaidi

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

Due to inadequate lighting, motion blur, occlusion, and the eventual disappearance of road signs, the determination of textual road signs is difficult to resolve. With the aid of a recurrent convolutional neural network (R-CNN), the current study focuses on detecting textual road signs in Baghdad at different times of day under varied situations, including vehicle speed, surrounding layers, epochs of the R-CNN, etc. Two types of different contrast on signs were used: blue and blue-green signs with white text. The differences in contrast seem to play an effective role in recall, sensitivity, and F1 score values. Results showed that the precision values for all signs and epochs were unity. For 20 and 60 epochs, the sensitivity values for the blue sign were 47.43% and 48.35%, respectively, while for the blue-green sign, the sensitivity values were equal to 95.19% for both numbers of epochs. The F1 scores were 0.6435 and 0.9753 for 20 epochs, while for 60 epochs it was 0.6518 and 0.9753 for blue and blue-green signs, respectively. The experiments validated the suggested software and provided implementation guidance to diagnose and automatically classify text road signs on streets.

Keywords

R-CNN, Labeling, Epoch, Detection, Baghdad, Recognition

1. Introduction

Artificial intelligence (AI) and pattern recognition are being used in traffic sign detection and recognition for many applications, like autonomous and assisted driving. A driver who can understand text traffic signs is alerted to potentially unsafe situations and inappropriate behavior ^[1]. Without text traffic signs to direct and inform motorists, pedestrians, and other road users, the world's traffic system would not function as it does. Text traffic signs reflect the closeness of various sites as well as the accessibility of services. Text road signs are frequently posted on poles in the center or to the side of the road. Textual traffic signs vary from country to country according to local norms and legislation. Technology, traffic laws, and regulations all have an impact ^[2].

Construction of an autonomous driving system is a fascinating subject that is growing in popularity. In some cases, the vehicle is equipped with sensors such as radar, laser, GPS, and cameras to monitor the environment. Combining a camera with computer vision technologies is the most typical method. When compared to other sensors, the camera's low cost and high output make it an appealing alternative ^[3].

Text traffic signs were created as a result of efforts to modernize the traffic system and improve driving safety. Government entities in charge of enforcing traffic regulations and collecting data on traffic collisions and patterns are essential resources for the scientific study of text traffic signs. International organizations and scientific institutions perform studies and research on text traffic signs and make recommendations and proposals to improve them more successful in increasing traffic safety ^[4]. Many nations across the world utilize instructional, cautionary, and directional text traffic signs, which are divided into distinct categories. Text traffic signs' colors serve as a classification system. For example, red color represents a danger or caution, yellow color represents a warning, and green color represents an instruction ^[5].

Every object-detecting system must go through two steps. A recognition process comes after the detection procedure. Traffic signs are classified based on their colors using a huge database built by training with video after detection. As a result, training is an essential component of any object-detecting system. Color distinguishes the traffic signs in each frame during the detection process. Text road signs are classified using a large database created by training with video after color detection ^[6]. Text traffic signs provide users with a range of helpful information and aid in improving road safety, reducing traffic collisions, and improving traffic control. Following textual traffic signs correctly shall improve road safety and reduce collisions ^[7].

The advancement of technology and cognitive science has enabled a more sophisticated detecting system to notify drivers of text traffic signs inside an automobile via a display screen utilizing recurrent convolutional neural network (R-CNN) algorithms ^[8]. Several studies on textual traffic sign detection have been conducted. An approach that employs a CNN based traffic sign classification algorithm is proposed. It also has a camera detection feature for traffic signals. A motorist will be able to concentrate carefully on the screen while retaining focus on a sign, saving time from having to study each sign ^[9].

2. Related Work

Joint transform correlation (JTC) and picture segmentation were used to automatically recognize road signs from any nation, regardless of color or shape. These methods made several contributions, including the development of distortion-invariant fringe-adjusted JTC, the introduction of two new criteria, and the reclassification of rectangular signs, as proposed by J. F. Khan et al. ^[10]. Techniques for locating and extracting text from traffic sign panels were also employed, and an OCR algorithm was utilized to recognize a variety of characters present on the traffic panel for effective text string extraction, as demonstrated by A. Mammeri et al. ^[11].

Experimental results using the German Traffic Sign Detection Benchmark (GTSDB) and Chinese Traffic Sign Detection Benchmark (CTSDB) datasets showed that the combination of Single Shot Multibox Detector (SSD) with Receptive Field Module (RFM) and Path Aggregation Network (PAN), which abbreviated to SSD-RP achieved a higher mean average precision (MAP) than other SSD algorithms and exhibited superior detection precision for identifying small traffic signs. SSD-RP surpassed well-known object recognition algorithms such as Faster R-CNN, Retina-Net, and YOLOv3 in terms of balancing detection speed and precision, as indicated by J. Wu et al. ^[12]. Furthermore, a lightweight, YOLOv4-based integration framework was suggested for real-time traffic sign detection using deep learning techniques. The architecture facilitated sharing of information and flow at different levels while reducing network computation overhead to address latency issues. The goal was to ensure a certain level of generalization and resilience while enhancing the detection performance of traffic signs in various objective environments, including scale and illumination fluctuations, as proposed by Y. Gu et al. ^[13].

3. The Proposed Scheme

As one of the most significant technologies, an R-CNN is frequently used to carry out image processing tasks. It consists of rectangular area proposals with CNN features. An R-CNN is a kind of neural network that resembles how the visual cortex in the human brain functions. Recognizing the most significant elements in an image is the main objective of a CNN, of which convolutional layers make up the majority of layers.

There is a difference between the CNN algorithm and the R-CNN algorithm. The CNN algorithm identifies, distinguishes, and can track a target within an image, while an R-CNN identifies a target within an image and follows it in an easy way due to its comprehensive manner that deals with every pixel within the image. An R-CNN has five stages ^[14]:

1. First, the region of interest is defined by labels that outline a set of intersecting squares.

2. The second stage, convolution, is used to apply filters and identify characteristics.

3. Third, a max pooling step is used to reduce the image's size while preserving its key details.

4. The image is converted to a 1D array (vector) after flattening.

5. All necessary connections in the full connection stage are completed. The phases are shown in Figs. 1 and 2.

The applications of R-CNN categorization techniques are numerous. For example, botany relies on precise standards to categorize and arrange plant specimens ^[17]. To find and diagnose cavities, an R-CNN is extensively utilized in dentistry ^[18]. Medical uses of classification include the identification and classification of brain tumors ^[19].

In the area of remote sensing, a study compared and contrasted the efficiency of the R-CNN approach with conventional photography for automatically identifying and mapping trees in UAV imagery ^[20]. Additionally, classification techniques are frequently used in object detection. An R-CNN is a popular method that has been effective in identifying items in photographs including faces, automobiles, and people ^[21]. The versatility of classification systems is demonstrated by these numerous applications.

Fig. 1. Stage of convolutional learning for filtering and feature detection [15].

Fig. 2. A fully linked recurrent convolutional neural network (R-CNN) [16].

3.1 Tools and Methodology

In the current research, several short video clips not exceeding nine seconds in length were captured textual traffic signs during different times throughout the day (before and afternoon), as shown in Fig. 3. The signs had white text and we divided into two groups. One group had a full blue background, and the second had a green rectangular shape at a height of 1.5-1.6 meters above the ground on highways in Baghdad. The videos were captured using an iPhone X equipped with a 10-megapixel (MP) camera and a mobile phone holder mounted inside a moving vehicle at different speeds, as shown in Fig. 4. In addition, data analysis and post-processing were performed with the help of a computer equipped with MATLAB (R2020a).

Text road signs are mounted on a platform in the middle or to the side of the road so that drivers can read them from a distance of 100 meters or less. In Baghdad, residential streets have a speed limit of roughly 60 km/h, and motorways have a limit of 100 km/h. An automobile travels 100 meters in more than three seconds. The motorist needs to decide what to do at this point.

This study suggests a frame rate of 30 frames per second for a video recording system. The system must have appropriate sign detection and recognition in at least one of 90 consecutive frames for a driver to make a correct decision, which means that the proposed system must reliably and accurately identify at least one such sign for all this time. Therefore, it is sufficient and reasonable for drivers to take the correct driving direction if a sign is confirmed and detected correctly every three seconds.

The R-CNN system must determine the appropriate course of action if the drivers are distracted from their targets because they are concerned with something other than the road. By using a trained database and an R-CNN object detector, text traffic signals can be detected. As can be seen in Fig. 5, a manual image labeler was utilized to outline each frame of collected video that contains the desired textual road signs. To begin, we used the training code in Fig. 6 to train the model by extracting as many features as possible from 463 images with 544 targets for blue signs and 482 images with 582 targets for blue-green signs in the datasets.

After performing numerous experimental activities, the layers were purposefully chosen, and the outcomes of this experiment inspired us to select values from the layers. The term "epoch" refers to the process of optimizing and training neural networks and deep learning models. A single iteration of the model-building process, known as a "training pass," involves processing the entire training dataset by the model, calculating losses, and updating parameters. The complete training dataset is cycled through the model several times when training for a certain number of epochs. In this manner, the model may "learn" from the data and improve over time. In addition, increasing the number of epochs may improve model performance. The model may "learn" from the data and improve over time. Increasing the number of epochs has the potential to improve model performance, but caution must be exercised to avoid the model overfitting the training set ^[22].

The current work used 20, and 60 epochs to observe the sensitivity during training. Epochs were used to extract the model, and the training time for models with 1-20 epochs was 7 minutes and 34 seconds, resulting in obtaining 16,380 images. For 1-60 epochs, the training time for this model took about 22 minute, and 42 seconds resulting in 49,140 images. Hence, the recognition stage in Fig. 7 was initialized.

The recall (R), sensitivity (S), precision (p), and F1 score can be calculated through the following equations ^[23]:

(1)

$ \mathrm{R}=\frac{\mathrm{T}_{\mathrm{P}}}{\mathrm{T}_{\mathrm{P}}+\mathrm{F}_{\mathrm{N}}} $

(2)

$ \mathrm{S}=\frac{\mathrm{T}_{\mathrm{P}}}{\mathrm{T}_{\mathrm{P}}+\mathrm{F}_{\mathrm{N}}}\times 100\mathrm{\% } $

(3)

$ \mathrm{p}=\frac{\mathrm{T}_{\mathrm{P}}}{\mathrm{T}_{\mathrm{P}}+\mathrm{F}_{\mathrm{p}}} $

(4)

$ \mathrm{F}_{1}\text{score}=\frac{2\times \text{Precision}\times \text{recall}}{\text{Precision}+\text{recall}} $

where T_P represents the cases in which the model predicted the presence of the target correctly. F_N represents the cases where the target was present, but the model failed to detect it. The recall represents the variable calculated by dividing T_P by the total number of instances or samples in the dataset (N). FP represents the cases where no target present, but the model predict it as present.

Fig. 3. Used signs.

Fig. 4. Camera setup inside a car.

Fig. 5. Manual Labeling stage.

Fig. 6. Training algorithm (I).

Fig. 7. Recognition algorithm (II).

4. Performance Evaluation

R-CNN technology plays an effective role in accurately identifying text road signs. In order to evaluate the performance improvement achieved by the proposed scheme, we considered a small test for 20, and 60 epochs. The variation of the parameters are shown in Tables 1 and 2, while Fig. 8 illustrates the algorithm's achievement in detecting the specified textual road signs.

For all employed signs, it is apparent that the precision and epoch variation have a polynomial relationship. A comparison between 20, and 60 epochs can be seen in Fig. 9 for all used parameters. The R-CNN approach succeeded in detecting blue and blue-green textual road signs with recall values equal to 0.4743 and 0.9519, and sensitivity values of 47.43% and 95.19%, respectively for 20 epochs. For 60 epochs, the recall values were equal to 0.4835 and 0.9519, with sensitivity values of 48.35% and 95.19%, respectively. For all textual road signs, the precision values were unity for both 20 and 60 epochs. The results of the F1 score were 0.6435 and 0.9753 for 20 epochs, while for 60 epochs they were 0.6518 and 0.9753 for blue and blue-green signs, respectively.}

In the tables above, one can noticed that there are excellent results in the detection process of blue-green signs. This is due to several factors such as the contrast between the colors of the signs (such as blue, green, and white), in addition to the quality of imaging during the day, which reduces the dispersion that may occur on the surface of the sign. The lack of contrast for the blue signs as well as the time of imaging (afternoon near sunset) led to a lack of detection. In general, the model is considered to be successful in the detection process.

Fig. 8. R-CNN application in detecting text road signs: (a) Marking the detected target with red rectangular shape; (b) Extracting target with score; (c) Extracted detected sign.

Fig. 9. The variation of (a) Tp, Fn; (b) recall; (c) sensitivity; (d) F1 Score for the used text road signs with the aid of the R-CNN model.

Table 1. Result of applying the R-CNN model at 20 epochs.

Data	Blue Sign	Blue-Green Sign
No. of images	463	482
No. of signs	544	582
TP	258	554
FN	286	28
FP	0	0
Recall	0.4743	0.9519
Sensitivity	47.43%	95.19%
Precision	1	1
F1 Score	0.6435	0.9753

Table 2. Result of applying the R-CNN model at 60 epochs.

Data	Blue Sign	Blue-Green Sign
No. of images	463	482
No. of signs	544	582
TP	263	554
FN	281	28
FP	0	0
Recall	0.4835	0.9519
Sensitivity	48.35%	95.19%
Precision	1	1
F1 Score	0.6518	0.9753

5. Conclusion

The R-CNN technique is one of the technologies used in computer vision and object recognition. It was developed as a practical answer to the problem of accurately identifying and categorizing things in visual data. The aim behind R-CNN is to exploit parts of images that may include objects of interest. The R-CNN technique is well known for its ability to recognize and localize objects. The R-CNN technique is one of the most successful approaches for detecting and pinpointing targets in images. Its high classification and identification accuracy can be attributed to its use of deep learning and prospective areas.

The Results showed that the contrast in text road signs affects the detection of them. The R-CNN approach succeeded in detecting blue and blue-green textual road signs with recall values equal to 0.4743 and 0.9519, and sensitivity values of 47.43% and 95.19% for 20 epochs, while for 60 epochs, the recall values equal to 0.4835 and 0.9519, with sensitivity values of 48.35% and 95.19%, respectively. For all textual road signs, the precision values were unity for both 20, and 60 epochs. The results of the F1 score were 0.6435 and 0.9753 for 20 epochs, while for 60 epochs they were 0.6518 and 0.9753 for blue and blue-green signs, respectively.

Thus, the issue of automatic text road signs detection and identification has been resolved. The scientific originality of the acquired results is that the sample detection method may be accurately and precisely identify blue and blue-green signs indicators in various situations. Prospects for future research include examining and contrasting various object detection methods, with other text road signs.

Recommendation

Even though this study only covered a tiny geographical region (a few streets in the capital of Baghdad), the technology used is regarded as cutting-edge in the field of artificial intelligence. Hence, more research into this area is necessary.

Observance of moral requirements

The research was conducted as part of the authors' jobs and had no outside funding. Therefore there are no conflicts of interest.

REFERENCES

Albelwi, S., & Mahmood, A. (2017). A framework for designing the architectures of deep convolutional neural networks. Entropy, 19(6), 242.

Brahim, J., Khalid El, M., & Noureddine, F. (2023). Developing an Efficient System with Mask R-CNN for Agricultural Applications. AGRIS on-line Papers in Economics and Informatics, 15(1), 61 - 72.

Gu, Y., & Si, B. (2022). A novel lightweight real-time traffic sign detection integration framework based on YOLOv4. Entropy, 24(4), 487.

Hameed, K., Chai, D., & Rassau, A. (2022). Score-based mask edge improvement of Mask-RCNN for segmentation of fruit and vegetables. Expert Systems with Applications, 190, 116205.

He, P., Zuo, L., Zhang, C., & Zhang, Z. (2019). A value recognition algorithm for pointer meter based on improved Mask-RCNN. 9th International Conference on Information Science and Technology (ICIST), (pp. 108-113). Hulunbuir, China.

Hussien, R. S., Elkhidir, A. A., & Elnourani, M. (2015). Optical character recognition of Arabic handwritten characters using neural network. 2015 International Conference on Computing, Control, Networking, Electronics and Embedded Systems Engineering (ICCNEEE), (pp. 456-461).

Hyder, A. A., Norton, R., Pérez-Núñez, R., Mojarro-Iñiguez, F. R., Peden, M., Kobusingye, O., et al. (2016). The Road Traffic Injuries Research Network: a decade of research capacity strengthening in low- and middle-income countries. Health Res Policy Sys , 14(14), 1-9.

Jain, S. (2020). Pushing the boundary of Semantic Image Segmentation. ETH Zurich: KTH, School of Electrical Engineering and Computer Science (EECS).

Kattenborn, T., Leitloff, J., Schiefer, F., & Hinz , S. (2021). Review on Convolutional Neural Networks (CNN) in vegetation remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing, 173, 24-49.

Kesav, N., & Jibukumar, M. G. (2022). Efficient and low complex architecture for detection and classification of Brain Tumor using RCNN with Two Channel CNN. Journal of King Saud University - Computer and Information Sciences, 34(8), 6229-6242.

Khan, J. F., Bhuiyan, S. A., & Adhami, R. R. (2011). Image segmen-tation and shape analysis for road-sign detection. IEEE Transactions on Intelligent Transportation Systems, 12(1), 83-96.

Li, W. (2021). Analysis of object detection performance based on Faster R-CNN. 6th International Conference on Electronic Technology and Information Science (ICETIS 2021), 1827. Harbin, China.

Mammeri, A., Khiari, E. H., & Boukerche, A. (2014). Road-sign text recognition architecture for intelligent transportation systems. IEEE 80th Vehicular Technology Conference (VTC2014-Fall), (pp. 1-5). Vancouver, BC, Canada.

Mehta, S., Paunwala, C., & Vaidya, B. (2019). CNN based traffic sign classification using adam optimizer. 2019 International Conference on Intelligent Computing and Control Systems (ICCS), (pp. 1293-1298). Madurai, India.

Mogelmose, A., Trivedi, M. M., & Moeslund, T. B. (2012). Vision-Based Traffic Sign Detection and Analysis for Intelligent Driver Assistance Systems: Perspectives and Survey. IEEE Transactions on Intelligent Transportation Systems, 13, pp. 1484-1497.

Qin, F., Fang, B., & Zhao, H. (2010). Traffic sign segmentation and recognition in scene images. 2010 Chinese Conference on Pattern Recognition (CCPR), (pp. 1-5). Chongqing, China.

Reinius, S. (2013, 1 30). Object recognition using the OpenCV Haar cascade-classifier on the iOS platform. Institutionen för informationsteknologi, Department of Information Technology, Uppsala Universitet.

Robielos, R., & Lin, C. J. (2022). Traffic Sign Comprehension among Filipino Drivers and Nondrivers in Metro Manila. Appl. Sci., 12(16), 8337.

Sai, B. N., & Sasikala, T. (2019, February). Object detection and count of objects in image using tensor flow object detection API. 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT), (pp. 542-546). Tirunelveli, India.

Wang, Y., Jiang, Z., Li, Y., Hwang, J. N., & Xing an, G. (2021). RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization. IEEE Journal of Selected Topics in Signal Processing, pp. (99):1-1.

Wu, J., & Liao, S. (2022). Traffic sign detection based on SSD combined with receptive field module and path aggregation network. Computational Intelligence and Neuroscience, Hindawi, 2022, 1-13.

Yu, K., Hao, Z., Post, C. J., Mikhailova, E. A., Lin , L., Zhao, G., et al. (2022). Comparison of classical methods and mask R-CNN for automatic tree detection and mapping using UAV imagery. Remote Sensing, 14(2), 295.

Zhu, Y., Xu , T., Peng, L., Cao, Y., Zhao, X., Li, S., et al. (2022). Faster-RCNN based intelligent detection and localization of dental caries. Displays, 74, 102201.

Author

Omar M. S. Ali

Omar M. S. Ali is an Ph.D. student at the Physics Department, College of Science, Mustansiriyah University, Baghdad, Iraq. He obtained his M.Sc degree in Remote sensing and Image processing from the Physics Department/ College of Science / Baghdad University in 2019. He obtained his B.Sc. degree in Physics from the Physics Department / College of Science/ Baghdad University in 2012. His interests are in Remote sensing, Image Processing, GIS, robotics programming, mathematics, and website interface design. He can be contacted by email at: omar_m_sultan@uomustansiriyah.edu.iq

Ali A. D. Al-Zuky

Ali A. D. Al-Zuky is a Professor at the Physics Department, College of Science, Mustansiriyah University, Baghdad, Iraq. He holds a Ph.D. degree in Physics / Digital Image Processing, from the Physics Department/ College of Science / University of Baghdad, 1999. He supervised more than 40 M.Sc and 20 Ph.D. projects for postgraduate students in Physics, Computer Science, Computer Engineering, and Medical Physics. He published more than 200 papers in scientific journals and at various local and international scientific conferences in addition to two patents. He received awards for Science Day from the Ministry of Higher Education and Scientific Research in Iraq in 2011 and 2012 and Education Award for Science in 2013. He can be contacted by email at:

prof.alialzuky@uomustansiriyah.edu.iq

Fatin E. M. Al-Obaidi

Fatin E. M. Al-Obaidi is an Assistant Professor at the Physics Department, College of Science, Mustansiriyah University, Baghdad, Iraq. She holds a Ph.D. degree in Physics from the Physics Department / College of Science / Mustansiriyah University. She received awards for Science Day from the Ministry of Higher Education and Scientific Research in Iraq in 2011. Her research areas are Image/Signal Processing, Analysis, Pattern Recognition, and Numerical Analysis. She can be contacted by email at: Sci.phy.fam@uomustansiriyah.edu.iq