ChoiYeon Ji1
RahimTariq1
ShinSoo Young1*
-
(Department of IT Convergence Engineering, Kumoh National Institute of Technology /
Gumi, Korea
yzygzy@kumoh.ac.kr, tariqrahim@ieee.org, wdragon@kumoh.ac.kr
)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
UAVs, CNN, Path planning, Stair climbing, LiDAR sensor
1. Introduction
UAV use is growing in areas such as scientific research, rescue missions, commerce,
and agriculture. Originally, UAVs were developed to be managed by an on-the-ground
pilot via remote-control communication [1]. Recently, UAVs have been moving closer to navigating with unusual degrees of autonomy.
Most UAVs employ global navigation satellite system technology and inertial sensors
to determine their geospatial positioning. It is necessary to overcome factors such
as GPS signal error, narrow passageways, and transparent glass for stable-flight UAVs
in indoor environments [2]. Studies in image-based stair-recognition for robots [3] and of techniques for ground robots [4] are ongoing; however, there is a lack of such research with UAVs. An abundance of
techniques, varying from learning-based to non-learning-based, have been suggested
to resolve UAV navigation dilemmas. The most popular non-learning-based method is
sensing and avoidance, which prevents accidents by steering vehicles in a reverse
orientation and navigating by path planning [5,6]. Another type of non-learning-based technique takes advantage of simultaneous localization
and mapping (SLAM). The inspiration is that, after creating a map of the surroundings
by utilizing SLAM, navigation is accomplished by path planning [7,8]. The work in [7] combines GraphSLAM [9] with an online path planning module in a proposal-approving UAV to determine obstacle-free
trajectories in foliage. A general characteristic of non-learning-based approaches
is that they demand precise path planning, which may result in unanticipated failures
when environments are extremely dynamic and complicated. To address this matter, machine
learning (ML) methods such as imitation learning and reinforcement learning (RL) have
been explored [10-12]. For example, a model-based RL approach called TEXPLORE [12] was presented, which is a high-level control system for navigation of a UAV within
a grid map having no barriers. And an imitation learning-based controller utilizing
a small set of human displays was presented that obtains reliable performance in forested
areas [10].
Therefore, this paper proposes a convolutional neural network (CNN)-based system
based on real-time stair recognition that can fly a UAV without colliding with stairs,
and that obtains distance information between walls or stairs through 2D light detection
and ranging (LiDAR) with a camera mounted on the UAV. In addition, algorithms were
designed for systems that recognize stairs, avoid collisions, and maneuver themselves,
which is one of the obstacles to an autonomous flight process, and flight experiments
were carried out after the actual UAV was implemented.
Deep learning (DL), which is a subcategory of machine learning, acts like the
human brain, and is therefore known as artificial intelligence (AI). Many applications
of machine learning have been proposed, with different signals representing data such
as music signals [13], 2D signals or images [14], and video signals [15]. CNNs are used for various purposes, such as classification, detection, and pattern
recognition, especially in health [16], drone applications [17], and autonomous driving systems. Recently, You Only Look Once (YOLO) was introduced
for real-time detection of objects, with each version improving the mean average precision
(mAP) per frame per second [18].
In this work, we attempted for the first time to use the YOLOv3-tiny model, and
improved the model further by adding a convolution layer to extract deep features
for the detection of stairs. This DL detection model was used in a classification
problem to determine each next maneuver.
The rest of this paper is organized as follows. Section 2 details related work,
while Section 3 explains the proposed scheme. Section 4 summarizes the experimental
results and the analysis. Section 5 provides concluding statements and suggests the
scope of future work.
2. Related Work
Previously, a 3D map of the local area was developed for autonomous UAV navigation.
In some cases, these methods were used to map exact quadcopters [19,20]. However, these methods are based on a smart control scheme, thereby restricting
their use to laboratory settings [21-23]. The map is learned through other manual route methods, and quadcopters travel the
same path [24]. For most outdoor flights (where precision is not as high as indoors), a GPS-based
posing projection is used.
Most applications use scale sensors, such as infrared sensors, RGB-D (red, green,
blue depth) sensors, or laser range sensors [25]. A single ultrasonic sensor was used in [26] as an automated navigation device with an infrared sensor. The condition evaluation
method of the LiDAR and inertial measurement unit (IMU) was advanced to work independently
in uncertain conditions that are denied by a GPS [27]. Range sensors have limitations, being heavy and high in power consumption.
The simultaneous localization and mapping (SLAM) technique uses separate optical
sensors to create a 3D image [21-23] from every UAV position on the map. A 3D map of an unknown indoor scenario was used
for the SLAM laser range finder [25]. The SLAM technique [29,31] offers single-camera indoor navigation. SLAM is highly complicated when it comes
to regenerating the 3D map region, requiring precise measurements and extensive resources
because additional sensors are needed.
SLAM can also set contact delays during real-time navigation. The studies in [31] and [32] addressed these issues. SLAM is primarily a practical system, and its output with
indoor materials (such as walls/roofs) is not considered good, because its differential
intensity is very weak. The entire corridor comprises partitions, roofs, and floors,
and SLAM technologies cannot attain the desired navigational quality.
3. The Proposed Scheme
This section discusses the system configuration for UAV recognition of stairs,
the deep learning model using YOLOv3-tiny, and the improved YOLOv3-tiny for detecting
stairs.
3.1 System Configuration
The proposed system was designed based on recognizing stairs with a camera mounted
on the UAV for indoor environments and on distances measured via the 2D LiDAR sensor
attached to the UAV’s side. Fig. 1 shows the flowchart for the entire system. The connections and communications between
the parts are both wired and wireless, as shown in Fig. 2. In particular, communications among the ground control station, the UAV, and the
onboard PC is via Wi-Fi/LTE. Meanwhile, the wired connection is only used for the
sensor.
The system’s actual implementation uses a Parrot Bebop 2 drone, which is suitable
for narrow passageways and convenient for load sensors. The UAV is equipped with an
RPLiDAR S1 laser scanner, which rotates 360$^{\circ}$ and can measure distances up
to 40m with a lightweight, mainboard Jetson TX2 embedded computing device (Auvidea
J120 carrier board) as shown in Fig. 3(c). The Lenovo ThinkPad T580 is used as a ground control system (GCS), and the equipment
required for the experiment is listed in Table 1. All algorithms are implemented in Python, and the Robot Operating System (ROS) was
used as middleware (software that can run multiple different programs together) in
a kinetic version.
Fig. 1. Flowchart for the proposed implementation.
Fig. 2. Network connections and the architecture of the proposed system.
Fig. 3. System configuration: (a) UAV movement axes; (b) illustration of the RPLiDAR
S1 scanning process; (c) the 2D-LiDAR sensor and the Jetson-TX2 onboard PC attached
to the UAV; (d) the test environment.
Table 1. Experiment Parameters.
Device
|
Model name
|
Company
|
Lidar sensor
|
RPLiDAR S1
|
Slamtec
|
UAV
|
Bebop drone 2
|
Parrot
|
Onboard PC
|
Jetson TX2
|
Nvidia
|
Carrier board
|
Auvidea J120
|
Auvidea
|
GCS
|
ThinkPad T580
|
Lenovo
|
LTE modem
|
LTE USB Stick
|
Huawei
|
Algorithm 1. Stair-climbing algorithm.
The LiDAR sensor uses distances measured along 360 points, as shown in Fig. 3(b). The distance data obtained by the LiDAR sensor were 0$^{\circ}$ to the floor, 90$^{\circ}$
to the front, and 180$^{\circ}$ to the ceiling, based on the direction of progress
for the UAV. In the polar coordination system, each of the raw laser points is defined
as $\{(\textit{d}$_${i}$, ${\theta}$$_{i}$); 0 ${\leq}$ $\textit{i}$ ${\leq}$ 359$\}$,
where $\textit{d}$$_{i}$ is the distance from the UAV center to the object, and ${\theta}$$_{i}$
is the relative angle of measurement. The information obtained by the LiDAR is stored
as a vector $(\textit{d}$$_{i}$, ${\theta}$$_{i}$), and the stored data are checked
to convert the values of the infinity scan.
3.2 Stair-climbing System
Algorithm 1 is used by the UAV to climb stairs. When steps are recognized by
the camera, the algorithm starts. If the distance between the UAV and the stairs is
longer than r meters, a straight start is performed on the $\textit{x-axis}$, or a
rising maneuver on the $\textit{z-axis}$, to avoid collisions if the distance is less
than $\textit{r}$ m. At this instant, if a staircase is not recognized, the stair
climb mission is determined as complete, and recognition for climbing the next step
commences.
3.3 Deep Learning Model for Detection of Stairs
In this study, a DL approach is implemented for detecting stairs, which the drone
uses to make decisions intelligently in order to follow the stairs and determine the
next maneuver. In this work, we improved the YOLOv3-tiny default model. The backbone
of YOLO is $\textit{darknet}$, where the YOLOv3-tiny default model uses six max-pooling
and seven convolution layers. We modified it by adding one more layer. Instead of
the softmax function, and where multi-class classification and detection is an issue,
regression is employed to solve the multi-class detection and classification problem
[33].
The proposed model starts by dividing the stair-image input into a G ${\times}$
G grid in the training stage. A bounding box is used as a tool for labeling five features—width
$\textit{w}$, height $\textit{h}$, vertical height $\textit{v}$, horizontal height
$\textit{u}$—as shown in Fig. 4, and confidence score $\textit{C,}$ which represents the presence of stairs within
the bounding box, and hence, represents the accuracy.
Fig. 4. Definition of the bounding box.
Fig. 6. YOLO models: the default YOLOv3-tiny and the improved YOLOv3-tiny.
In the proposed YOLOv3-tiny method, we attempt to make the model computationally
inexpensive, along with implementing it to extract more semantic features. Max-pooling
is used after each convolution layer to reduce the computational complexity and improve
image feature extraction. Fig. 6 shows the network architecture for both the default and the improved YOLOv3-tiny
models. The loss function is obtained as an end-to-end network, and can be expressed
as follows [33]:
where $\textit{iouErr}$, $\textit{coordErr}$, and $\textit{clsErr}$ indicate
the IOU error, coordinates error, and classification error, respectively. We used
a rectified linear unit (ReLU) as an activation function to achieve sparsity and reduce
vanishing gradient issues [25]. Table 2 details the training configuration employed for both YOLOv3-tiny and the proposed
improved YOLOv3-tiny model.
Table 2. Training Parameters for Both Models.
Parameters for training
|
Configuration values
|
Image/stairs
|
428 x 428
|
Batch size
|
32
|
Learning rate
|
0.001
|
Optimizer
|
Stochastic gradient descent
|
Decay
|
0.0005
|
Momentum
|
0.9
|
Epochs
|
20,000
|
3.4 ROS
The nodes that are separated and managed by the master are shown in Fig. 5. In addition, the topic node continuously communicates the results processed by the
publisher node, and makes them available to other nodes by subscription. The system
proposed in this paper is largely a UAV status message, a $\textit{scan}$ value obtained
from the LiDAR, and a visual message obtained from the UAV camera. When running $\textit{darknet}$
on the ROS, the messages required from the published messages are subscripted. Among
them, a message containing information on the bounding box is received through the
$\textit{darknet_ros}$ node. When the proposed DL model detects a staircase, a message
from the LiDAR is subscribed as a token that allows the UAV to perform actions and
maneuvering based on the incoming output. This process continues till detection is
performed within $\textit{darknet_ros}$.
4. Experimental Results and Analysis
A dataset was created in the Kumoh National Institute of Technology, South Korea,
by employing a Bebop drone that has a high-resolution camera and a GPS mounted on
it. The dataset comprises 1000 images at a resolution of $1920\times 1080$ resized
to $428\times 428$ before model training. For training and testing purposes, the dataset
was split 70% and 30%, respectively. Fig. 7 depicts the training phase of the proposed improved YOLOv3-tiny model where 20,000
epochs were set. As shown in Fig. 7, the blue line represents the average loss achieved (0.215) whereas the red line
represents the highest mAP (91.6%).
The detection performance of the improved YOLOv3-tiny model was benchmarked against
the default model by utilizing the same parametric configurations and dataset. The
metrics used to reflect the efficacy in stair detection of both models are accuracy,
recall, F1-score, and precision. Table 3 shows that the proposed improved YOLOv3-tiny model outperformed the default model
in terms of accuracy, recall, and F1-score. Furthermore, a low precision value with
higher values of other performance metrics shows stable performance from the model.
Fig. 8 shows the real-time detection of the proposed model, where the top left image represents
the starting point of the UAV after takeoff, and the top right image represents the
middle position of the UAV when hovering and climbing. In Fig. 8, the bottom left image shows the last step of the stairs, while the bottom right
image shows the instant when the UAV was located at a distance of $r~ $ meters from
the stairs.
For the experimental scenario, the set of stairs climbed was 2.1 m long and 2.85
m wide, as shown in Fig. 3(d). Based on Algorithm 1, Some of the experiment’s results are shown in Fig. 9, depicting commands sent by the GCS and the corresponding images from the built-in
camera of the UAV. In Fig. 9, we have tried to show the different stages in the decisions made by the UAV, such
as moving forward or upward, hovering, and going to the next stair to climb it. Furthermore,
the actual trajectory-wise UAV movement from the beginning of the staircase to the
beginning of the next step is shown in Fig. 10 as a 3D plot. This movement started at approximately 0.8 m from the starting point
of the stairs. In total, 88 experiments were performed three times each, and the results
are shown in Table 4 for the time elapsed during takeoff and landing on average, reported to be 55.97
sec.
Fig. 7. Training phase of the improved YOLOv3-tiny.
Fig. 8. Detection results from the improved YOLOv3-tiny model.
Fig. 9. GCS screen commands and screen shots from the UAV’s built-in camera for (a)
forward movement; (b) upward movement; (c) hovering; (d) going to the next stair.
Fig. 10. Trajectory of the UAV.
Table 3. Performance of the Detection Scheme.
Parameter metrics
|
YOLOv3-tiny
(%) [17]
|
Modified YOLOv3-tiny
(%)
|
Accuracy
|
90.01
|
92.06
|
Recall
|
89.00
|
91.00
|
F1-score
|
83.00
|
85.00
|
Precision
|
78.00
|
73.00
|
Table 4. Performance Time of the Proposed Stair-climbing Scheme.
No.
|
Takeoff
|
Landing
|
1
|
0:06.35
|
1:00.91
|
2
|
0:05.22
|
0:57.16
|
3
|
0:05.78
|
1:07.18
|
Average
|
0:05.78
|
1:01.75
|
5. Conclusion
In this study, we designed, implemented, and experimented with a system in which
a UAV recognizes and climbs stairs, which are obstacles often encountered during indoor
flight. The system was implemented through a CNN-based imaging process for real-time
stair recognition and by using LiDAR-based distance measurements. The accuracy derived
from stair recognition was 92.06%, and the actual test results showed that stair climbing
was carried out without collisions.
Future research would require more efficient algorithms to climb various types
of stairs. Moreover, the proposed system can be combined with SLAM navigation to expand
studies to systems that can autonomously fly through multiple floors.
ACKNOWLEDGMENTS
This paper was supported by the National University Development Project in 2020.
REFERENCES
Prasad P. R., et al. , 2018, Monocular vision aided autonomous UAV navigation in indoor
corridor environments., IEEE Transactions on Sustainable Computing, Vol. 4, No. 1,
pp. 96-108
Lu Y., et al. , 2018, A survey on vision-based UAV navigation., Geo-spatial information
science, Vol. 21, No. 1, pp. 21-32
Ilyas M., et al. , Jul 2018, Design of sTetro: A Modular, Reconfigurable, and Autonomous
Staircase Cleaning Robot, Journal of Sensors, Vol. 2018, pp. 16
Gao X., et al. , 2017, Dynamics and stability analysis on stairs climbing of wheel-track
mobile robot, International Journal of Advanced Robotic Systems, Vol. 14, No. 4, pp.
1729881417720783
Israelsen J., et al. , 2014, Automatic collision avoidance for manually tele-operated
unmanned aerial vehicles., In 2014 IEEE International Conference on Robotics and Automation
(ICRA), pp. 6638-6643
Chnibo L., et al. , 2013, UAV position estimation and collision avoidance using the
extended Kalman filter., IEEE Transactions on Vehicular Technology, Vol. 62, No. 6,
pp. 2749-2762
Cui J., et al. , 2016, Autonomous navigation of UAV in foliage environment., Journal
of intelligent & robotic systems, Vol. 84, No. 1, pp. 259-276
Huizhong Z., et al. , 2015, StructSLAM: Visual SLAM with building structure lines.,
IEEE Transactions on Vehicular Technology, Vol. 64, No. 4, pp. 1364-1375
Oguz A. E., et al. , June 2014, On the consistency analysis of A-SLAM for UAV navigation.,
Proc. SPIE 9084, Unmanned Systems Technology XVI, Vol. 9084, pp. 90840R
Ross S., et all. , 2013, Learning monocular reactive uav control in cluttered natural
environments., In 2013 IEEE international conference on robotics and automation, pp.
1765-1772
Fraust A., et all. , 2017, Automated aerial suspended cargo delivery through reinforcement
learning., Artificial Intelligence, Vol. 247, pp. 381-398
Imanberdiyev N., et all. , 2016, Autonomous navigation of UAV by using real-time model-based
reinforcement learning., In 2016 14th international conference on control, automation,
robotics and vision (ICARCV), pp. 1-6
Sturm B. L., et al. , 2019, Machine learning research that matters for music creation:
A case study, Journal of New Music Research, Vol. 48, No. 1, pp. 36-55
Raharjo J., et al. , Nov 2019, Cholesterol level measurement through iris image using
gray level co-occurrence matrix and linear regression, ARPN Journal of Engineering
and Applied Sciences, Vol. 14, No. 21, pp. 3757-3763
Zhang Y., et al. , Jan 2020, Machine learning based video coding optimizations: A
survey., Information Sciences, Vol. 506, pp. 395-423
Heidari M., et al. , Sep 2020, Improving the performance of CNN to predict the likelihood
of COVID-19 using chest X-ray images with preprocessing algorithms, International
journal of medical informatics, Vol. 144, pp. 104284
Hassan S. A., et al. , Oct. 2019, Real-time uav detection based on deep learning network,
In 2019 International Conference on Information and Communication Technology Convergence,
pp. 630-632
Redmon J., et al. , 2016, You only look once: Unified, real-time object detection,
In Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 779-788
Mellinger D., et al. , 2011, Minimum snap trajectory generation and control for quadrotors,
In 2011 IEEE international conference on robotics and automation, pp. 2520-2525
Mellinger D., et al. , Jan. 2012, Trajectory generation and control for precise aggressive
maneuvers with quadrotors, The International Journal of Robotics Research, Vol. 31,
No. 5, pp. 664-674
Checchin P., et al. , 2010, Radar scan matching slam using the fourier-mellin transform,
In Field and Service Robotics, Vol. 62, pp. 151-161
Engel J., et al. , 2014, LSD-SLAM: Large-scale direct monocular SLAM, In European
conference on computer vision, Vol. 8690, pp. 834-849
Mei C., et al. , Jun. 2011, RSLAM: A system for large-scale mapping in constant-time
using stereo, International journal of computer vision, Vol. 94, No. 2, pp. 198-214
M uller M., et al. , Sep. 2011, Quadrocopter ball juggling, in 2011 IEEE/RSJ International
Conference on Intelligent Robots and Systems, pp. 5113-5120
Huang A. S., et al. , Aug. 2011, Visual odometry and mapping for autonomous flight
using an RGB-D camera, Robotics Research., Vol. 100, pp. 235-252
Roberts J. F., et al. , Sep. 2007, Quadrotor using minimal sensing for autonomous
indoor flight, In European Micro Air Vehicle Conference and Flight Competition (EMAV2007)
Bry A., et al. , May. 2012, State estimation for aggressive flight in GPS-denied environments
using onboard sensing, In 2012 IEEE International Conference on Robotics and Automation,
pp. 1-8
Bachrach A., et al. , Dec. 2009, Autonomous flight in unknown indoor environments,
International Journal of Micro Air Vehicles, Vol. 1, No. 4, pp. 217-228
Achtelik M., et al. , 2011, Onboard IMU and monocular vision based control for MAVs
in unknown in-and outdoor environments., 2011 IEEE International Conference on Robotics
and Automation, pp. 3056-3063
Blösch M., et al. , 2010, Vision based MAV navigation in unknown and unstructured
environments., 2010 IEEE International Conference on Robotics and Automation, pp.
21-28
Nützi G., et al. , Nov. 2011, Fusion of IMU and vision for absolute scale estimation
in monocular SLAM., Journal of intelligent & robotic systems, Vol. 61, No. 1, pp.
287-299
Weiss S., et al. , 2012, Versatile distributed pose estimation and sensor self-calibration
for an autonomous MAV, In 2012 IEEE International Conference on Robotics and Automation,
pp. 31-38
Rahim T., et al. , 2021, A Deep Convolutional Neural Network for the Detection of
Polyps in Colonoscopy Images, Biomedical Signal Processing and Control, Vol. 68 102654
Author
Yeonji Choi received her BSc in Electrical Engineering in 2019 and received her
MSc from the Department of IT Convergence Engineering at Kumoh National Institute
of Technology (KIT) Gumi, South Korea, in 2021. Currently, she is working as graduate
research assistant at the Wireless and Emerging Network System (WENS) Lab in the Department
of IT Convergence Engineering, Kumoh National Institute of Technology (KIT), Gumi,
South Korea. Her major research interests include intelligent control and systems,
Unmanned Aerial Vehicles, and wireless communications.
Tariq Rahim is a PhD student in the Wireless and Emerging Network System Laboratory
(WENS Lab) of the Department of IT Convergence Engineering, Kumoh National Institute
of Technology, Republic of Korea. He completed his master’s degree in Information
and Communication Engineering from Beijing Institute of Technology, PRC, in 2017.
His research interests include image and video processing and quality of experience
for high-resolution videos.
Soo Young Shin received his BSc, MSc, and PhD in Electrical Engi-neering and Computer
Science from Seoul National University, Korea, in 1999, 2001, and 2006, respectively.
He was a visiting scholar for the FUN Lab at the University of Washington, U.S.A.,
from July 2006 to June 2007. After three years working in the WiMAX Design Lab of
Samsung Electronics, he is now an associate professor for the School of Electronics
at Kumoh National Institute of Technology, joining the institute in September 2010.
His research interests include wireless LANs, WPANs, WBANs, wireless mesh networks,
sensor networks, coexistence among wireless networks, industrial and military networks,
cognitive radio networks, and next-generation mobile wireless broadband networks.