RanjanNavin1
BhandariSovit1
KimYeong-Chan1,2
KimHoon1,2*
-
(IoT and Big Data Research Center, Incheon National University, Yeonsu-gu, Incheon,
22012, Korea
{ranjannavin07, sovit198}@gmail.com
)
-
(Department of Electronics Engineering, Incheon National University, Yeonsu-gu, Incheon,
22012, Korea
{kyc0288, hoon}@inu.ac.kr
)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Q-learning, Mobile routing, Traveling salesman problem, Dijkstra algorithm, Illegal parking, Edge, Computing, Networks
1. Introduction
Route planning in a network is an optimization problem that has been studied in applied
mathematics and computer science for decades. This problem is computationally expensive
and has been solved using many different algorithms proposed, but providing fast and
reliable solutions to this optimization problem is still a challenging task [1]. Route planning plays a critical role in a variety of applications, such as packet
transfer in wireless sensor networks [2-6], routing in 5G and 6G networks [7-9], and vehicle routing [10-14]. Several researchers have developed and proposed many exact, heuristic, and metaheuristic
algorithms to solve the route-planning optimization problem. In particular, these
algorithms have their respective pros and cons. For example, an exact algorithm, such
as the Dijkstra algorithm, computes every possible solution and chooses the best among
them. However, even though this algorithm provides the exact solution, it tends to
suffer from high time-complexity even while route planning in a fairly small network
[14-16]. However, since time complexity is a major factor for real-time route planning in
a network, it is much better to look for an approximate solution instead of this exact
solution. Therefore, the heuristic and metaheuristic algorithms, such as the local
search algorithm, ant colony optimization, particle swarm optimization, etc., are
preferred in solving the route-planning optimization problem. However, these algorithms
also suffer from limitations. For instance, the heuristic algorithms, in many cases,
are unable to generate the best solution to this optimization problem.
Recently, machine learning (ML) algorithms have become popular in various fields due
to these algorithms’ ability to handle large-data and data non-linearities and the
availability of cheap and high computational power [17-19]. Reinforcement learning (RL) is the major class of machine learning and has been
inspired by the behavior learning of animals. The RL algorithms seem to perform well
in partially observed environments or high-dimensional spaces, when an approximate
solution to any complex problem may be of greater value than an exact/best solution,
which is possible only to any simple problem, in these environments/spaces. The RL-based
algorithms can solve the route-planning optimization problem in applications like
packet transfer in a wireless network, vehicle route planning in a transportation
network, signal routing in a communication network, navigation through a maze, etc.,
which involve large networks, and the RL algorithms like Q-learning give the best
solution to this problem regardless of the environment.
The advantages of Q-learning, such as its off-policy and temporal-difference learnings,
inspired us to propose the DQSR algorithms (one for each stage). The DQSR is based
on two Q-learning agents: the first stage’s Q-learning agent develops the shortest
path between each customer and this customer’s nearest neighbor customers in the network
and finds this path’s length and the second stage’s Q-learning agent finds the shortest
route from any customer to the other customers by virtually visiting each other customer
only once and returning to the starting customer in a route that has the shortest
distance. Hence, both Q-learning agents share the responsibility of achieving the
shortest route, and hence the DQSR achieves this route relatively fast compared to
the exact, some heuristic, or the single-stage, Q-learning based algorithms. In addition,
the concept of nearest neighbor customers cannot be used in the exact and heuristic
algorithms for the shortest-route planning/generation. In essence, the main contributions
of this paper are summarized as follows:
· We developed a dual-stage, Q-learning based, shortest-route generation (DQSR) and
its two algorithms.
· We implemented the idea of calculating the shortest path and distance between each
customer and this customer’s nearest neighbor customers for the shortest-route generation.
· Our extensive experiments on the DQSR based, illegal- parking surveillance demonstrate
the efficiency of the DQSR compared to two other algorithms used for the same purpose,
in terms of time complexity and shortest-route generation.
We described the background of our research in this section. The rest of the paper
is structured as follows. Section 2 presents the works related to our research. Section
3 presents the methodology of our research, including the background on reinforcement
learning, problem statement, and the DQSR algorithms. Section 4 presents the data
set used in the experiment, algorithms compared with the DQSR algorithms, simulation
parameters, the case study results, and discussion. Finally, Section 5 concludes and
provides the future direction for this research.
2. Related Work
Route planning is one of the most important functions of an intelligent transportation
system. It is widely used in various areas, such as daily travel route planning [20-22], package delivery service or indoor warehouse management [23-25], and traffic congestion reduction [26-28]. The route planning also saves a lot of time and money for the transport management
agencies. In recent years, many academic works have been devoted to solving the route-planning
optimization problem with multiple locations to be connected by the route, in a relatively
low time. Some of the recent academic works have also been handling route planning
under multiple objectives.
Nowadays, many algorithms find the shortest path between nodes/locations in a network,
with widespread applications. For example, an exact algorithm based on traditional
graph-based methods, such as the Dijkstra algorithm, is used to find this shortest
path in a network under multiple destinations [29]. Notably, the multi-destination Dijkstra algorithm (MDDA) is the recent algorithm
to find the multi-destination shortest path in a network. In particular, the MDDA
uses a Fibonacci heap [30,31]. Meanwhile, Lu et al. [32] proposed a node combination method to implement the Dijkstra algorithm in finding
the shortest path in a network by following three simple iterative steps: find the
nearest neighbor node of the source node, combine the nearest neighbor node with the
source node, and modify the weights on the edge connecting the source and nearest
neighbor nodes. However, the node combination method is not efficient or faster than
the MDDA. Likewise, Orlin et al. [33] proposed an algorithm to efficiently find the shortest path in a network, which is
applicable even to a very sparse network or under a small number of positive integer
weights. Further, Jin et al. [34] presented the modified MDDA to find the shortest path in a time-schedule network.
In particular, they considered the network constraints on its edges to find this shortest
path. Also, Asaduzzaman et al. [35] proposed a modified Dijkstra algorithm to find the shortest path to multiple destinations
in a network by using a minimum number of weights. Interestingly, Hart et al. [36] proposed an A* algorithm, which is an exact algorithm for incorporating information
from an optimization problem to a graphical analysis performed to find a minimum-cost
path in a network. Similarly, Ticha et al. [37] found the multi-destination, bi-objective shortest path using an A* search strategy.
In particular, their search strategy involves choosing some part of the path for a
non-dominated path, in each iteration, to arrive at one of the destination nodes.
However, the last two exact and heuristic algorithms and alike have a serious issue
of time complexity and have a polynomial run-time according to the graph size. Meanwhile,
Bast et al. [38] developed an algorithmic approach for fast routing in a road network having transit
nodes, reducing the quickest-path queries in this road network into a small number
of table lookups reducing the query time significantly. Their algorithm is more than
one million times faster than the best-known algorithm for routing in a general network.
Finally, Shen et al. [39] improved the shortest-path databases of a network by (i) contracting the input graph
before preprocessing and (ii) limiting the preprocessing step to only a selected subset
of the graph nodes.
Vehicle routing or vehicle-route planning is the generalization of the traveling salesman
problem (TSP). Meanwhile, involving only the nearest neighbors, a greedy and very
simple strategy, allows the best node/vertex to join the solution. But, this strategy
is inefficient in solving the TSP. Bentley [40] achieved an improved run-time in finding the approximate solutions to large-geometric
TSPs. In addition, Clarke et al. [41] presented an iterative procedure that enables the rapid selection of an optimum or
near optimum route for a fleet of trucks (with varying capacities) from a central
depot to several delivery points. Recently, new features are being brought by the
ML approaches, and researchers have been coupling ML features with the well-known
heuristic concepts, introducing new hybrid algorithms. For example, Mele et al. [42] presented a new, constructive heuristics driven by the ML for solving the TSP. This
solution’s algorithm enables a reasonable generalization and unveils an efficient
balance between the ML and optimization techniques. Likewise, Zhang et al. [43] proposed deep reinforcement learning for the TSP, with time windows and rejections.
Their deep RL showed superior performance compared to the Tabu search heuristics,
in terms of the solution quality and inference computation time. Further, Reinelt
[44] provided the TSP library (TSPLIB) for researchers, with a broad set of test problems
from various sources and with various properties. Finally, the arc routing or Chinese
Postman problem [45,46] presents a process for finding a least-cost way to traverse each arc of a network
at least once and return to the node/vertex from which the agent starts.
3. Methodology
3.1 Reinforcement Learning
Reinforcement learning (RL) is a computational approach to understanding and automating
goal-directed learning and decision-making. It is about taking suitable action to
maximize the reward in a particular situation. The RL differs from other computational
approaches, such as supervised learning. In particular, the RL agent learns by interacting
with its environment with/without prior knowledge of this environment or without relying
on examples. Also, the RL is based on the Markov decision processes (MDPs).
An MDP is a discrete-time stochastic control process providing the mathematical framework
to model the decision-making in a situation where the outcomes are partly random and
partly under the decision-maker’s control. The MDPs are structured from a finite set
containing the state-space ($S$), action-space ($A$), state transition probability
matrix ($P$), and reinforcement or reward ($R$). In particular, it is the tuple$\left(S,A,P_{a},R_{a}\right)$.
In addition, a state $\left(s\in S\right)~ $is the representation of the environment
at a specific time step. Further, an action $\left(a\in A\right)$ is one in a set
of the possible actions in the state$s\,.$ And, the state transition probability,
$P_{a}\left(s,s'\right)=$ $P\left(s_{t+1}=s'~ |s_{t}=s,a_{t}=a\right)$, gives the
probability that the action$~ a~ $in the state$~ s~ $at the time$~ t~ $will lead to
the state $s'$at the time$~ t+1$. Finally, the reinforcement or reward, $R_{a}\left(s,s'\right),$
is the immediate reward received after transitioning from the state$~ s$ to the state
$s'$.
The RL agent interacts with the environment in a sequence of steps at the time t:
the agent (i) receives a representation of the environment, i.e., the state$~ s$ (ii)
selects the action$~ a$ by using the ${\varepsilon}$-greedy$~ $selection (explained
below) and executes this action (iii) receives the immediate reward signal $R_{a}\left(s,s'\right)$
(iv) updates the learning matrix (v) observes the new state$~ s'$of the environment.
The goal of the RL agent is to learn a policy$~ \left(\pi \right)$, i.e., to learn
to map from the state space ($S$) to the action space ($A$) in a way to maximize the
numerical reward signal. Therefore, the RL agent is not explicitly programmed of its
action in a state. Instead, the RL agent learns through a trial and error procedure
the action yielding the most reward in a state. In addition, the RL agent also follows
the exploration and exploitation strategies during its learning to find the most rewarding
action in a state. The exploration strategy allows the RL agent to improve the agent’s
current knowledge about each state-action pair by taking a random action from the
action list, instead of taking the best possible action, in a state. On the other
hand, the exploitation strategy makes the RL agent choose the best action from the
actions list to get the most reward by exploiting the agent’s current state-action
value (knowledge). However, a balance between the exploration and exploitation strategies
is essential and accomplished by the ${\varepsilon}$-greedy selection or the Boltzmann
selection. The ${\varepsilon}$-greedy selection’s parameter $\varepsilon \left(0<\varepsilon
<1\right)$ is defined and the decision policy $\pi \left(s\right)~ $is applied according
to the following equation.
where $~ a^{\mathrm{*}}~ $and $a^{r}~ $refer to the best and random actions selected
(under exploitation and exploration), respectively, in the current state$~ s$.
The Q-learning is one of the off-policy RL algorithms in which the agent learns the
best state-action value function independent of the policy being followed. In particular,
the Q-learning is based on the temporal-difference learning (TD), i.e., the value
estimates are updated incrementally as the agent interacts with the environment, without
waiting for the final outcome. In particular, the Q-learning’s state-action value
is updated as,
where $Q_{t}\left(s,a\right)$ and$Q_{t+1}\left(s,a\right)$ are the state-action value
(at the time $t$) and the updated state-action value (at the time $t+1)$ of the state-action
pair $\left(s,a\right),$respectively. In addition, $r\left(s,a\right)$ is the immediate
reward for executing the action $a~ $ in the state$s,\alpha \in \left[0,1\right]$
is the learning rate, $\gamma \in \left[0,1\right]$ is the discount factor, and $\max
_{a'}Q\left(s',a'\right)$ is the best Q-value (state-action value) in the state$s'$for
the action $a'$.
3.2 Problem Statement
Let the road network be represented by a directed graph, $G=\left(V,E\right)$, where
$V$ is the set of nodes, $E$ is the set of edges (road segments), and each edge connects
two nodes in the road network. The length of the edge $e\in E$ is denoted by $l_{e}\in
\mathrm{~ \mathbb{R}}.$ Let us consider that the RL agent has to visit ‘n’ customers’
respective locations or edges, $D=\left\{d_{1},d_{2},\ldots ,d_{n}\right\}\in E$,
exactly once before returning to the starting node$~ \left(d\in D\right)$. Here, an
edge containing a customer is considered a node to perform the DQSR. Notably, the
second stage of the DQSR is a classic example of the TSP.
3.3 The Dual-stage, Q-learning based, Shortest-route generation (DQSR) Algorithms
3.3.1 The Q-learning based, Shortest-path Generation Algorithm
The calculation of the shortest path between two nodes is one of the classic problems
in graph theory. The DQSR uses the Global Positioning System (GPS) location, i.e.,
the latitude and the longitude, of all the customers to calculate each customer’s
all ‘$x$’ nearest neighbor customers’ respective locations. Then, it uses the Q-learning-based,
shortest-path generation algorithm (Algorithm 1) to generate the shortest path between each customer and this customer’s nearest
neighbor customers and find this path’s length.
The inputs to the Algorithm 1 are given in line 1 of the algorithm. In particular, the RL agent requires the road
network represented by a directed graph, $G\left(V,E\right)$, road network length
graph, i.e., the road network with the each of its edge having the corresponding length,
list of customers to visit (in terms of their corresponding edges), each customer’s
nearest neighbor customers list, ‘n’ Q shortest-path tables \{$Q_{\colon 1},Q_{\colon
2},\ldots ,Q_{\colon n}\}$to store the updated state-action value for each of the
‘n’ customers, respectively, distance table $\left(D_{Route}\right)~ $to store the
length of the shortest path between each customer and this customer’s nearest neighbor
customers, path table $\left(P_{Route}\right)~ $to store the shortest path between
each customer and this customer’s nearest neighbor customers, and maximum number of
random walks the RL agent can take before terminating the ‘while’ loop. Notably, the
Q shortest-path tables each have the same state transition probability matrix as that
of the road network. In contrast, the distance and path tables each have the same
state transition probability matrix as that of each customer and this customer’s nearest
neighbor customers, i.e., with size $n\times x.$ The $i^{th}$ Q shortest-path table,
$Q_{\colon i}\,,$ stores the Q-value update for the path from the $i^{th}$ customer’s
all ‘$x$’ nearest neighbor customers to the $i^{th}$ customer. In particular, the
$Q_{\colon i}$ is used to calculate the shortest path and distance from the $i^{th}$
customer’s all ‘$x$’ nearest neighbor customers to the $i^{th}$ customer; here ‘$\colon
$’ represents the list of the $i^{th}$ customer’s nearest neighbor customers. Also,
the output of the algorithm includes the updated path and distance tables. In addition,
the algorithm requires specifying the learning rate, discount factor, initial epsilon
value, epsilon decay rate, and minimum epsilon.
The working of Algorithm 1 is shown in lines 4 to 24 of the algorithm. The first ‘for loop’ loops through all
the customers to visit, as mentioned in line 4 of the algorithm. The RL agent copies
the $i^{th}$ customer’s Q shortest-path table and the road network length graph as
the reward table with the sign of its each edge length changed to negative, in lines
5 and 6 of the algorithm, respectively. The second ‘for loop’ loops through the $i^{th}$
customer’s all ‘$x$’ nearest neighbor customers, as mentioned in line 7 of the algorithm.
Since the goal of the RL agent is to reach the $i^{th}$ customer’s nodes, we manipulate
the reward table by adding extra positive reward to each edge that is connected to
any of these nodes, as in line 8 of the algorithm. This extra reward enables the RL
agent to learn effectively. Q shortest-path table update for the path from the $i^{th}$
customer’s all ‘x’ nearest neighbor customers to the $i^{th}$ customer is presented
in lines 7 to 20 of the algorithm. First, the RL agent starts its functioning from
the assigned current state and random-walks one step at a time until its termination
according to the ‘while’ condition given in line 11 of the algorithm. In other words,
the RL agent’s random walking is terminated either when the number of agent random
walks completed exceeds the maximum number of random walks the agent can take (max_steps)
or when the agent reaches the end state. Meanwhile, the RL agent chooses the action
($a$) at the current state ($s$) (as mentioned in line 12 of the algorithm), based
on the $\varepsilon $-greedy selection, which enables the agent to either explore
or exploit depending on the epsilon value. The RL agent then performs the chosen action
and moves to the next state ($s')$, as in line 13 of the algorithm. Subsequently,
the RL agent updates the previous Q-value, according to Eq. (2), as in line 14 of the algorithm. Further, the RL agent checks if the state $s'$ is
the end state and changes the termination parameter (goal)’s value accordingly, as
in lines 15 and 16 of the algorithm, respectively. Next, the RL agent replaces the
current state with the next state and decrements the max_steps. The epsilon value
is then updated after the termination of the RL agent’s path search, as in lines 18
and 19 of the algorithm. The entire process of Q-value update, as in lines 10 to 19
of the algorithm, is repeated $T$ (an user-defined number) times by the ‘for loop’
(in line 10 of the algorithm). These $T~ $repetitions allow the RL agent to learn
the correct shortest-path from the $i^{th}$ customer’s all ‘$x$’nearest neighbor customers
to the $i^{th}$ customer. The RL agent then calculates the path and distance from
the $i^{th}$ customer’s all ‘$x$’nearest neighbor customers to the $i^{th}$ customer,
in line 20 of the algorithm, based on the Algorithm 3 (given subsequently in this paper). Finally, the RL agent updates the $i^{th}$ component
of the distance and path tables (in lines 22 and 23 of the algorithm, respectively).
An example of the Q-learning based generation of the shortest path from each customer’s
nearest neighbor customers to this customer is shown in Figs. 1(a)-(d). In particular, Fig. 1(a) shows an arbitrary road network with five customers. Each
customer’s starting and end nodes are represented by ‘green’ and ‘light blue’ color
dots, respectively. The RL agent then updates the shortest path from each customer’s
nearest neighbor customers to this customer by visiting each customer once and returing
to the starting customer.
The process of updating the Q shortest-path table for each customer is shown in Fig. 1(b). Fig. 1(b) also shows the list of the respective Q shortest-path table updates for all the five
customers. In particular, the first image in Fig. 1(b) shows the learned Q-value for the path from the respective end node of all the customer
$d_{1}$’s nearest neighbor customers to $d_{1}.$ In addition, each edge’s width represents
this edge’s Q-value compared to other edges. Since, the Q-value grows when reaching
the destination, keeping the Q shortest-path table update for the path from each customer’s
nearest neighbor customers to this customer in one Q shortest-path table is possible
and doesn’t contradict when exploiting the Q shortest-path table to generate the shortest
path. Fig. 1(c) shows each customer to this customer’s each nearest neighbor customer transitional
distance-table. Similarly, Fig. 1(d) shows each customer to this customer’s each nearest neighbor customer transitional
path-table. Notably, the paths in this table are reversed when the states are reversed.
Fig. 1. An example of the Q-learning based generation of the shortest path from each customer’s nearest neighbor customers to this customer and the shortest route connecting all the customers.Fig. 1(a)is an arbitrary road network with five customers.Fig. 1(b)shows the list of the respective Q shortest-table updates for all the five customers and each edge’s width represents this edge’s Q-value compared to other edges.Fig. 1(c)shows each customer to this customer’s each nearest neighbor customer transitional distance-table.Fig. 1(d)shows each customer to this customer’s each nearest neighbor customer transitional path-table.Fig. 1(e)shows the Q-value transition between each customer and this customer’s nearest neighbor customers.Fig. 1(f)shows the shortest route from the customer $d_{1}$, generated based on theAlgorithm 3and the final Q-routing table fromFig. 1(e).
Algorithm 1. Q-learning based, Shortest-path generation.
3.3.2 The Q-learning based, Shortest-route Generation Algorithm
Algorithm 2 is for the Q-learning based generation of the shortest route from any customer to
other customers by visiting each other customer once and returning back to the starting
customer. The inputs to the algorithm are given in line 1 of the algorithm. The RL
agent requires the Q-routing table having each customer and this customer’s nearest
neighbor customers’ state transition, the distance routing and path routing tables
from the Algorithm 1, the reward table having its state transitions same as those of the Q-routing table
with the initial reward value as -1, the list of customers to visit, the maximum number
of random walks for the RL agent, the total number of episodes, and the starting customer.
The output of the algorithm includes the shortest route and distance. The algorithm
also requires specifying the learning rate, discount factor, initial epsilon value,
epsilon decay rate, and minimum epsilon.
The working of Algorithm 2 is shown in lines 4 to 38 of the algorithm. The algorithm consists of five subparts:
the global variable setup, episode variable setup, Q-value update, the global variable’s
update by comparing the episode variables, and route and route distance generation.
The Q-routing and reward tables are copied as the best Q-routing and reward tables,
as in line 4 of the algorithm. Then, the global variable for the number of visited
customers and the distance covered, respectively, is initialized, as in line 5 of
the algorithm.
Algorithm 2. Q-learning based, Shortest-route generation.
Algorithm 3. Path/route generation and the corresponding distance calculation.
The ‘for loop’ loops through all the available episodes, as shown in line 8 of the
algorithm. The episode variables are updated in each episode, as in lines 7 to 11
of the algorithm. The Q-value update during each loop is similar to the Q-value update
of Algorithm 1 and is shown in lines 12 to 26 of the algorithm. The RL agent starts its functioning
from the assigned current state and random-walks one step at a time until its termination
according to the ‘while’ condition given in line 12 of the algorithm. The RL agent
is terminated either when its completed number of random walks exceeds the maximum
number of random walks the agent can take (max_steps) or when the agent reaches the
end state. The agent then chooses an action ($a$) at the current state ($s$), based
on the $\varepsilon $-greedy selection which enables the agent to either explore or
exploit based on the epsilon value, as in line 13, and selects the next state, as
given in line 14 of the algorithm. Subsequently, the episode reward table is updated,
as in lines 15 to 22 of the algorithm. A negative reward is assigned between the current
and next states if the RL agent visits all the customers and ends at a different customer
than the starting customer, as in lines 15 and 16 of the algorithm. In addition, the
reward table is updated with a negative reward if the RL agent visits the same customer
more than once, as in lines 19 and 20 of the algorithm. On the other hand, a positive
reward is assigned if the RL agent visits the starting customer’s each other customer
once and returns to the starting customer, as in lines 17 and 18 of the algorithm.
Then, the next to visit customer is removed from the to-visit customers list, as in
lines 21 and 22 of the algorithm. The RL agent subsequently updates the previous Q-value,
based on the current and next customer states and using Eq. (2), as in line 23 of the algorithm. Also, the RL agent replaces the current state with
the next state and decrements the max_steps, as in line 26 of the algorithm. Finally,
the number of customer locations visited (VD) by the RL agent is counted after the
Q-learning process is terminated. Further, the global route and global route distance
variables are calculated based on the VD, as in lines 28 to 35 of the algorithm. If
the VD is equal to the previous global_visited_path, no changes in the best reward
and the Q-routing tables are made. Otherwise, if the VD is greater than the previous
global_visited_path, the length of the global route distance is set to negative infinity,
and the best reward and Q-routing tables are replaced by the episode reward and Q-routing
tables, respectively, as in lines 29 to 31 of the algorithm. This step is followed
by assigning a global_visited_path equal to the VD. Further, if the episode route
distance (episode_distance) is greater than the global route distance, then the global
route distance is replaced by the episode route distance, and the best reward and
Q-routing tables are replaced by the episode reward and Q-routing tables, respectively,
as in lines 33 to 35 of the algorithm. The epsilon value is then updated after the
termination of the RL agent’s route search, as in lines 36 and 37 of the algorithm.
Finally, the RL agent generates the shortest route and distance after the Q-value
update for all the possible episodes is completed, based on Algorithm 3. An example of the shortest route generation is shown in Figs. 1(e) and 1(f). Fig. 1(e) shows the Q-value transition between each customer and this customer’s nearest neighbor
customers. Subsequently, the shortest route from any customer to other customers is
generated based on Algorithm 3 andthe final Q-routing table from Fig. 1(e).
Algorithm 3 is for the path/route generation and the corresponding distance calculation. The
inputs to the algorithm are the updated Q-value table, starting and ending nodes,
and distance table (the road length graph for the path generation and the distance
routing table for the route generation), as in line 1 of the algorithm. The output
of the algorithm is the path/route generated and its corresponding distance, as in
line 2 of the algorithm. The working of the algorithm is shown in lines 3 to 14 of
the algorithm. The current state, path, and distance are assigned their initial values
in lines 3 to 5 of the algorithm. In addition, the variable ‘goal’ is set to False
initially, as in line 6 of the algorithm. Notably, the RL agent calculates the path/route
according to lines 7 to 13 of the algorithm. In particular, the RL agent goes through
the ‘while loop’ until the terminating condition is met. The RL agent then calculates
the next state from the current state by using the Q-value table through exploitation,
as in lines 8 and 9 of the algorithm. Subsequently, the path and distance are updated,
as in lines 10 and 11 of the algorithm. In addition, the RL agent checks if the next
state is the end state during each ‘while loop’, as in line 12 of the algorithm. If
the next state is the end state, the variable ‘goal’ is set to ‘True’, as in line
13 of the algorithm, and the ‘while loop’ terminates. Finally, the path/route and
its corresponding distance are returned by the algorithm.
4. Experiments
4.1 Dataset
We train the RL agent under the DQSR algorithms and evaluate the DQSR’s effectiveness
by conducting a case study on the DQSR-based, illegal-parking surveillance in Yeonsu-gu,
Incheon, South Korea. Two types of data were required for this case study: (i) the
road network with the road information of the Yeonsu-gu region and (ii) the illegal-parking
locations in this region. So, the road network was generated using the Quantum Geographical
Information System (QGIS), a free and open-source cross-platform desktop GIS application
that supports the viewing, editing, and analysis of geospatial data. Further, we collected
the details of parking violations in the Yeonsu-gu region for the period of 2017-2019,
specifically between 07:00-09:00 on weekdays. The dataset used in this case study
is shown in Fig. 2. The ‘pink’ line shows the region of concern, i.e., the Yeonsu-gu
region, and the ‘black’ lines show the road network of this region. Also, the ‘green’
dots in Fig. 2 show the respective locations of illegal parking (by the customers)
that need to be visited by the RL agent to investigate the parking violation, and
the ‘blue’ lines show the road links with illegal parking reported in these road links.
In addition, a numerical value adjacent to a road link indicates this road link’s
length. Fig. 2 also shows that most parking violations were near a road link and not
on a road link. Therefore, we selected the road link nearest to each respective customer
(with a corresponding illegal-parking location) for surveillance. The detailed feature
representation of the dataset is shown in Table 1. In particular, the dataset consists of the road network information, such as the
link id, link starting and ending nodes, road link length, longitude and latitude
of each link’s starting node, and road link (i.e., customer) nearest to the illegal-parking
location. In addition, Table 2 shows the basic information on the dataset. Meanwhile, Yeonsu-gu, Incheon, has an
area of 50.8 km$^{2}$ and includes 2153 road links with a total road network length
of 557.5~km.
Fig. 2. Dataset Visualization. The ‘pink’ line shows the Yeonsu-gu region’s boundary, the ‘black’ lines show the road network with road length information, the ‘green’ dots represent the respective illegal-parking (by the customers) locations that need to be included in the route, and the ‘blue’ lines show the road links with illegal parking reported in these links.
Table 1. The feature representation of the dataset used to evaluate the DQSR’s effectiveness.
S. No.
|
LINK_ID
|
S_NODE
|
E_NODE
|
Length (m)
|
Longitude
|
Latitude
|
IP
|
Customer index
|
1.
|
1610309000
|
1610019100
|
1610119400
|
442.095
|
123.623
|
37.4323
|
-
|
-
|
2.
|
1610067301
|
1610019100
|
1610090800
|
426.306
|
123.623
|
37.4323
|
-
|
-
|
3.
|
1610067100
|
1610019100
|
1610019300
|
261.743
|
123.623
|
37.4323
|
13
|
0
|
4.
|
1640269200
|
1610118900
|
1640085900
|
88.833
|
126.599
|
37.4187
|
-
|
-
|
5.
|
1610294600
|
1610118900
|
1610118700
|
197.158
|
126.599
|
37.4187
|
-
|
-
|
6.
|
1640001700
|
1630002600
|
1640016500
|
230.725
|
126.697
|
37.4361
|
-
|
-
|
7.
|
1630012200
|
1630002600
|
1630003000
|
329.310
|
126.697
|
37.4361
|
24
|
1
|
8.
|
1630012000
|
1630002600
|
1630003900
|
575.696
|
126.697
|
37.4361
|
-
|
-
|
9.
|
1640002000
|
1630003100
|
1640016600
|
361.970
|
126.681
|
37.4161
|
-
|
-
|
10.
|
1640002100
|
1630003100
|
1640016700
|
243.891
|
126.681
|
37.4161
|
18
|
2
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
Table 2. The basic information of the dataset used to evaluate the DQSR’s effectiveness.
S. No.
|
Parameter
|
Value
|
1.
|
Region
|
Yeonsu-gu, Incheon, South Korea
|
2.
|
Total area
|
50.8 km2
|
3.
|
Total road length
|
557498.47 m
|
4.
|
Number of road links
|
2153
|
5.
|
Number of illegal parking
|
NA
|
6.
|
Number of customers
|
135
|
4.2 Algorithms Compared and Simulation Parameters
The effectiveness and superiority of the DQSR algorithms are verified by comparing
these algorithms with two other exact and heuristic algorithms, respectively. The
first algorithm compared (Q+TPS) is based on a Q-learning algorithm that generates
the shortest path between each customer to all other customers and a TSP algorithm
that solves for the shortest route covering all the customers (without revisiting
any customer) and returning to the starting customer. The second algorithm compared
(Dijikstra+TSP) is based on the Dijkstra and Travelling Salesman Problem algorithms,
where the Dijkstra algorithm generates the meta-graph (the shortest path from each
customer to all other customers), and the TSP algorithm finds the shortest route covering
all the costumers (without revisiting any customer) and returning to the starting
customer.
Table 3 summarizes the simulation parameters and their values for implementing the DQSR algorithms
and the two algorithms compared with the DQSR algorithms. The simulation parameters
for the DQSR are mentioned in Table 3’s lines 1 to 9. Notably, the DQSR’s two Q-learning-based algorithms have different
simulation parametric values. In particular, the difference between their respective
epsilon decay rates and the number of training episodes is very large.
The RL agent learns the Q-value table for 50 pairs of source and destination customers
in the DQSR’s Algorithm 1 and goes in a loop of 30 episodes per pair. Therefore, although the epsilon decay
rate and total episodes are set to 0.9 and 30, respectively, the RL agent has to go
through 1500 cycles before calculating the shortest path and distance. Whereas the
DQSR’s Algorithm 2 is tasked only with calculating the shortest route covering all the 135 customers.
Since this algorithm searches for the shortest route without repeating with the smallest
distance, the epsilon decay rate and total episodes are set to 0.9999 and 40000, respectively,
for the better learning of the RL agent.
The Q+TSP consists of two algorithms. The first algorithm is Q-learning based and
used for the shortest path generation between all 134 other source customers to one
particular destination customer. Since some of the customers are far from the destination
customer, the epsilon decay rate and episodes are kept larger than the DQSR’s Algorithm 1. In particular, the epsilon decay rate and episodes are set to 0.95 and 40, respectively.
Also, this process is repeated for all 135 customers. In addition, we used the TwoOpt_solver
algorithm, a python’s inbuilt function that solves the TSP, as the Q+TSP’s second
algorithm. Similarly, the Dijkstra+TSP consists of two algorithms. The first algorithm
is the Dijkstra algorithm which calculates the shortest path and distance between
each customer and other customers. The second algorithm is the TSP algorithm that
solves for the shortest route connecting all the customers (by visiting each customer
once and returning the source customer). In particular, we used python’s inbuilt function
for the Dijkstra algorithm based on the network graph and the TwoOpt_solver algorithm
for solving the TSP. Notably, we implemented all the DQSR algorithms and the two algorithms
compared, using python 3.7 on an Ubuntu 18.04.4 machine with 1 NVIDIA TITAN Xp Graphics
Card and a GPU memory of 11 GB. Notably, the performance of each model was unaffected
by the use of the GPU.
Table 3. Simulation parameters.
S. No.
|
Parameter
|
Shortest path
between customers
|
Best route
covering all the customers
|
Dual-stage, Q-learning based, Shortest-route generation (DQSR)
|
1.
|
Algorithm
|
Q-learning
|
Q-learning
|
2.
|
Learning rate ($\alpha$)
|
0.8
|
0.8
|
3.
|
Epsilon ($\varepsilon$)
|
1.0
|
1.0
|
4.
|
Discount factor ($\gamma$)
|
0.9
|
0.9
|
5.
|
Epsilon decay rate
|
0.9
|
0.9999
|
6.
|
Minimum epsilon
|
0.05
|
0.05
|
7.
|
Episodes
|
30
|
40000
|
8.
|
Number of customers
|
135
|
135
|
9.
|
Number of nearest neighbor customers
|
50
|
-
|
Q-learning for the shortest-path generation + Travelling Salesman Problem based, shortest-route-generation
(Q+TSP)
|
10.
|
Algorithm
|
Q-learning
|
TSP: TwoOpt_solver
|
11.
|
Learning rate ($\alpha$)
|
0.8
|
-
|
12.
|
Epsilon ($\varepsilon$)
|
1.0
|
-
|
13.
|
Discount factor ($\gamma$)
|
0.9
|
-
|
14.
|
Epsilon decay rate
|
0.95
|
-
|
15.
|
Minimum epsilon
|
0.05
|
-
|
16.
|
Episodes
|
40
|
-
|
17.
|
Number of customers
|
135
|
135
|
18.
|
Number of nearest neighbor customers
|
134
|
-
|
Dijkstra algorithm for the shortest-path generation + Travelling Salesman Problem
based, shortest-route-generation (Dijkstra+TSP)
|
19.
|
Algorithm
|
Dijkstra algorithm
|
TSP: TwoOpt_solver
|
20.
|
Number of customers
|
135
|
135
|
4.3 Results and Discussion
We present and discuss the results of the case study that used the DQSR, under various
parameters, and analyze the DQSR’s performance by comparing this performance to those
of the respective other two algorithms (Q+TSP and Dijkstra+TSP).
4.3.1 Route Visualization
The road network and the route information are the two data required for the route
visualization on the QGIS application. The road network is generated through the QGIS
application or can be found in the government portal. Notably, we generated the road
network from scratch for the Yeonsu-gu, Incheon, South Korea, and stored it as a shape
(.shp) file. The route result is generated using a python program. The route result
contains three data fields (i) a list of road links or edges through which the RL
agent should travel through, (ii) the sequence number denoting the order in which
the agent should travel, and (iii) a flag to separate the agent-selected road links
from the total road links list. The route information is finally stored as a comma-separated
value (CSV) file.
Meanwhile, the python program and QGIS have not been embedded together in this work.
Rather, the route result from the python program is manually uploaded to the QGIS
application for route visualization. First, the shapefile containing the road network
information is loaded to the QGIS through the ‘Add Vector Layer’ function. In particular,
the road network layer consists of fields, such as the road link id, node id, location
information, etc. Similarly, the route information file is uploaded to QGIS through
the ‘Add Vector Layer’ function. It consists of fields like road link id, sequence
number, and flag. Next, we join the route layer to the road network layer based on
a common field. Consequently, two new data fields, namely sequence number and flag,
are added to the road network layer. Also, the road link id in the road network layer
is categorized into two groups based on the flag data field: (i) the one selected
for the agent to travel through and (ii) the one not included in the route planning.
This categorization of the road link is performed through the QGIS Symbology properties,
and the road sequence level’s categorization is performed through the QGIS Label properties.
Fig. 3 shows the visualization of the shortest routes from the DQSR and Q+TSP algorithms,
respectively. In particular, these results are visualized on a QGIS map. The ‘black’
lines represent the road network, ‘green’ dots represent the illegal-parking locations,
and dotted ‘blue’ lines represent the customer links or the customers for the DQSR-based
RL agent to visit. Fig. 3(a) shows the shortest route covering all the customers and returning to the starting
customer (‘red’ line). In addition, the numbers mentioned in the road network show
the order in which the DQSR-based RL agent had to follow the road links to visit all
the mentioned customers and return to the starting customer. Correspondingly, the
DQSR-based RL agent’s shortest route is 58155.14 m long. In addition, the DQSR-based
RL agent had to visit 600 out of the total 2152 road links, for the shortest route.
Importantly, the DQSR-based RL agent could choose to travel through previously traveled
road links without concerning that the agent would visit a previously visited customer.
The agent was allowed to make such a choice if and only if the path through a customer
revisited is the shortest path connecting two other customers. Fig. 3(b) shows the zoomed-in visualization of the region within the ‘purple circle’ of Fig. 3(a). This zoomed-in image of Fig. 3 (b) is more intuitive and easier to visualize. Since
the DQSR-based RL agent visited the road links multiple times, only the last sequence
numbers of the road links in the route are displayed in the image.
Fig. 3. Route visualization on QGIS: (a) The DQSR algorithms based shortest-route; (b) The zoomed-in visualization of the region within the ‘purple circle’ ofFig. 3(a); (c) The Q+TSP algorithm based shortest-route; (d) The zoomed-in visualization of the region within the ‘purple circle’ ofFig. 3(c). The dark purple arrows inFig. 3(d)highlight some of the differences in the shortest-route generation by the DQSR and Q+TSP algorithms, respectively.
Fig. 3(c) shows the shortest route generated by the Q+TSP algorithm. In particular, this shortest
route is represented by the ‘purple’ line of Fig. 3(c). The numbers in the road network show the sequence in which the agent had to travel
the road links to complete the shortest route. Notably, the Q+TSP algorithm required
55703.39 m for the route connecting all the customers. In addition, the Q+TSP agent
visited 597 out of the total 2153 road links in generating the shortest route. Meanwhile,
the Q+TSP agent was allowed to go through the same road links without violating the
condition of visiting an already visited customer. Fig. 3(d) is the zoomed-in image of the region within the ‘purple circle’ of Fig. 3(c), which is more intuitive for the reader to understand the shortest route. Further,
Fig. 3(d)’s dark purple arrows highlight some of the differences in the shortest-route generation
by the DQSR and Q+TSP algorithms, respectively.
Table 4 summarizes the algorithms’ respective performances. The table includes the physical
distance of the shortest route generated by each of the algorithms and also provides
the algorithms’ respective time complexities in generating the shortest routes. Hence,
the Q+TSP algorithm provides the shortest route in terms of travel distance which
is approximately 55.7 km for this route, whereas the DQSR’s shortest route overshot
the Q+TSP algorithm’s travel distance by approximately 3 km. But, a 3 km overshoot
is merely 0.5 % of the Yeonsu-gu region’s total road length of 557.49 km. Therefore,
we can argue that both the DQSR and Q+TSP algorithms respectively generate the shortest
route. However, DQSR is more preferred compared to the other two algorithms when the
time complexity is taken into account. Table 4 shows that the DQSR required 237 s to calculate the shortest path between each customer
and this customer’s nearest neighbor customers and required another 209 seconds to
calculate the shortest route connecting all the customers, i.e., a total time of 446
s. In contrast, the Q+TSP algorithm required 1412 s to calculate the shortest path
between each customer and all other customers and 0.23 s to calculate the shortest
route connecting all the customers, i.e., a total time of 1412.23 s. Since the road
network for our case study is very large with very high network depth, the Dijkstra+TSP’s
Dijkstra algorithm couldn’t generate the shortest path in 6 h. Therefore, we chose
to terminate the algorithm.
Table 4. The comparison of the respective shortest-routes generated by the DQSR, Q+TSP, and Dijkstra+TSP algorithms, in terms of shortest-route distance and time complexity.
S. No.
|
Algorithm(s)
|
Parameter
|
Value
|
Time complexity
|
Algorithm 1
|
Algorithm 2
|
1.
|
DQSR
|
Route length
|
58.15 km
|
237 s
|
209 s
|
2.
|
Number of visited links
|
600
|
3.
|
Q+TSP
|
Route length
|
55.70 km
|
1412 s
|
0.23 s
|
4.
|
Number of visited links
|
597
|
5.
|
Dijkstra+TSP
|
Route length
|
-
|
>6 h
|
-
|
6.
|
Number of visited links
|
-
|
5. Conclusion
This paper elaborated on all the aspects of the proposed dual-stage, Q-learning based,
shortest-route generation and presented its training process. In particular, this
shortest-route generation was used for illegal-parking surveillance in Yeonsu-gu,
Incheon, South Korea. The analysis of this application, based on experiments, shows
that our DQSR algorithms perform better compared to two other shortest-route generation
algorithms, in terms of time complexity, and has achieved almost the same shortest-route
distance as these two algorithms compared. In particular, this comparison shows that
our DQSR was three times faster than the Q+TSP algorithm and had produced the shortest
route with a 5 % higher distance than that of the Q+TSP. In addition, the time complexity
in our model can further be reduced by lowering the number of nearest neighbor customers,
which we fixed to 50 in this study, provided there is enough intersection of customers
in each nearest neighbor customer group to generate the shortest route. Further, we
plan to increase and decrease the road network size and check the performance of the
DQSR in terms of time complexity and shortest-route distance, in the future.
ACKNOWLEDGMENTS
This work was supported by Incheon National University Research Grant in 2017.
REFERENCES
Nazari M., Oroojlooy A., Takáč M., Snyder L. V., 2018, Reinforcement Learning for
Solving the Vehicle Routing Problem, In Proceedings of the 32nd International Conference
on Neural Information Processing Systems (NIPS'18), Curran Associates Inc., Red Hook,
NY, USA, pp. 9861-9871
Singh A. K., Pamula R., 2021, An Efficient and Intelligent Routing Strategy for Vehicular
Delay Tolerant Networks, Wireless Network, Vol. 27, pp. 383-400
Geetha M, Ganesan R., 2021, CEPRAN-Cooperative Energy Efficient and Priority Based
Reliable Routing Protocol with Network Coding for WBAN, Wireless Pers Commun, Vol.
117, pp. 3153-3171
Liu D., Cui J., Zhang J., Yang C., Hanzo L., 2021, Deep Reinforcement Learning Aided
Packet-Routing for Aeronautical Ad-Hoc Networks Formed by Passenger Planes, IEEE Transactions
on Vehicular Technology, Vol. 70, No. 5, pp. 5166-5171
Yao H., Mai T., Jiang C., Kuang L., Guo S., 2019, AI Routers & Network Mind: A Hybrid
Machine Learning Paradigm for Packet Routing, IEEE Computational Intelligence Magazine,
Vol. 14, No. 4, pp. 21-30
Roig P. J., Alcaraz S., Gilly K., Bernad C., Juiz C., 2022, Arithmetic Framework to
Optimize Packet Forwarding among End Devices in Generic Edge Computing Environments,
Sensors, Vol. 22, No. 421
Waqas A., Saeed N., Mahmood H., Almutiry M., 2022, Distributed Destination Search
Routing for 5G and beyond Networks, Sensors, Vol. 22, No. 472
Alhussein O., et al., 2018, Joint VNF Placement and Multicast Traffic Routing in
5G Core Networks, 2018 IEEE Global Communications Conference (GLOBECOM), pp. 1-6
Celik A., Tetzner J., Sinha K., et al., 2019, 5G Device-To-Device Communication Security
and Multipath Routing Solutions, Appl Netw Sci, Vol. 4, No. 102
Feng L, et al., 2021, Explicit Evolutionary Multitasking for Combinatorial Optimization:
A Case Study on Capacitated Vehicle Routing Problem, IEEE Transactions on Cybernetics,
Vol. 51, No. 6, pp. 3143-3156
Zhang J., Yang F., Weng X., 2019, An Evolutionary Scatter Search Particle Swarm Optimization
Algorithm for the Vehicle Routing Problem With Time Windows, IEEE Access, Vol. 6,
pp. 63468-63485
Li J., Wang F., He Y., 2020, Electric Vehicle Routing Problem with Battery Swapping
Considering Energy Consumption and Carbon Emissions, Sustainability, Vol. 12, No.
10537
Konstantakopoulos G. D., Gayialis S. P., Kechagias E. P., Papadopoulos G. A., Tatsiopoulos
I. P. A., 2020, Multiobjective Large Neighborhood Search Metaheuristic for the Vehicle
Routing Problem with Time Windows, Algorithms, Vol. 13, No. 243
Ngo T. G., Dao T. K., Thandapani J., Nguyen T. T., Pham D. T., Vu V. D., 2021, Analysis
Urban Traffic Vehicle Routing Based on Dijkstra Algorithm Optimization, In: Sharma
H., Gupta M. K., Tomar G. S., Lipo W. (eds) Communication and Intelligent Systems.
Lecture Notes in Networks and Systems, Springer, Singapore, Vol. 204
Zheng Y.-L., Song T.-T., Chai J.-X., Yang X.-P., Yu M.-M., Zhu Y.-C., Liu Y., Xie
Y.-Y., 2021, Exploring a New Adaptive Routing Based on the Dijkstra Algorithm in Optical
Networks-on-Chip, Micromachines, Vol. 12, No. 54
El-Sherbeny N. A., 2010, Vehicle Routing with Time Windows: An Overview of Exact,
Heuristic and Metaheuristic Methods. J. King Saud Univ. Sci., Vol. 22, pp. 123-131
Neupane D., Seok J. A., 2020, Review on Deep Learning-Based Approaches for Automatic
Sonar Target Recognition, Electronics, Vol. 9, No. 1972
Bhandari S., Ranjan N., Khan P., Kim H., Hong Y.-S., 2021, Deep Learning-Based Content
Caching in the Fog Access Points, Electronics, Vol. 10, No. 512
Bhandari S., Kim H., Ranjan N., Zhao H. P., Khan P., 2020, Best Cache Resource Based
on Deep Neural Network for Fog Radio Access Networks, Journal of Internet Technology,
Vol. 21, No. 4, pp. 967-975
Lazakis I., Khan S., 2021, An Optimization Framework for Daily Route Planning and
Scheduling of Maintenance Vessel Activities in Offshore Wind Farms, Ocean Engineering,
Vol. 225, No. 108752
Chen C., Jiang J., Lv N., Li S., 2020, An Intelligent Path Planning Scheme of Autonomous
Vehicles Platoon Using Deep Reinforcement Learning on Network Edge, IEEE Access, Vol.
8, pp. 99059-99069
Prianto E., Kim M., Park J.-H., Bae J.-H., Kim J.-S., 2020, Path Planning for Multi-Arm
Manipulators Using Deep Reinforcement Learning: Soft Actor-Critic with Hindsight Experience
Replay, Sensors, Vol. 20, No. 5911
Lu E. H-C., Ya-Wen Y., 2019, A Hybrid Route Planning Approach for Logistics with Pickup
and Delivery. Expert Syst, Appl, Vol. 118, pp. 482-492
Tong B., Wang J., Wang X., Zhou F., Mao X., Zheng W., 2022, Best Route Planning for
Truck-Drone Delivery Using Variable Neighborhood Tabu Search Algorithm, Appl. Sci.,
Vol. 12, No. 529
Vahdanjoo M., Zhou K., Sørensen C. A. G., 2020, Route Planning for Agricultural Machines
with Multiple Customers: Manure Application Case Study, Agronomy, Vol. 10, No. 1608
Ranjan N., Bhandari S., Khan P., Hong Y.-S., Kim H., 2021, Large-Scale Road Network
Congestion Pattern Analysis and Prediction Using Deep Convolutional Autoencoder, Sustainability,
Vol. 13, No. 5108
Ranjan N., Bhandari S., Zhao H. P., Kim H., Khan P., 2020, City-Wide Traffic Congestion
Prediction Based on CNN, LSTM and Transpose CNN, IEEE Access, Vol. 8, pp. 81606-81620
Li Z., Huang J., 2019, How to Mitigate Traffic Congestion Based on Improved Ant Colony
Algorithm: A Case Study of a Congested Old Area of a Metropolis, Sustainability, Vol.
11, No. 1140
Balakrishnan A., Banciu M., Glowacka K., Mirchandani P., 2013, Hierarchical Approach
for Survivable Network Design, Eur. J. Oper. Res., Vol. 225, pp. 223-235
Fredman M. L., Tarjan R. E., 1987, Fibonacci Heaps and Their Uses in Improved Network
Optimization Algorithms, J. ACM, Vol. 34, pp. 596-615
Qu T., Cai Z. A., 2015, Fast Isomap Algorithm Based on Fibonacci Heap, In International
Conference in Swarm Intelligence; Springer: Cham, Switzerland, pp. 225-231
Lu X., Camitz M., 2011, Finding the Shortest Paths by Node Combination, Appl. Math.
Comput., Vol. 217, pp. 6401-6408
Orlin J. B., Madduri K., Subramani K., Williamson M. A., 2010, Faster Algorithm for
the Single Source Shortest Path Problem with Few Distinct Positive Lengths, J. Discret.
Algorithms, Vol. 8, pp. 189-198
Jin W., Chen S., Jiang H., 2013, Finding the K Shortest Paths in a Time-schedule Network
with Constraints on Arcs, Comput. Oper. Res., Vol. 40, pp. 2975-2982
Asaduzzaman M., Geok T. K., Hossain F., Sayeed S., Abdaziz A., Wong H.-Y., Tso C.
P., Ahmed S., Bari M. A., 2021, An Efficient Shortest Path Algorithm: Multi-Destinations
in an Indoor Environment, Symmetry, Vol. 13, No. 421
Hart P. E., Nilsson N. J., Raphael B. A., 1968, Formal Basis for the Heuristic Determination
of Minimum Cost Paths, IEEE Transactions on Systems Science and Cybernetics, Vol.
4, No. 2, pp. 100-107
Ben Ticha H., Absi N. A., 2017, Solution Method for the Multi-Destination Bi-Objectives
Shortest Path Problem, Elsevier: Amsterdam, The Netherlands
Bast H., Funke S., Sanders P., Schultes D., 2007, Fast Routing in Road Networks with
Transit Nodes, Science, Vol. 316, No. 5824:566
Shen B., Cheema M. A., Harabor D. D., Stuckey P. J., 2021, Contracting and Compressing
Shortest Path Databases, Proceedings of the International Conference on Automated
Planning and Scheduling, Vol. 31, No. 1, pp. 322-330
Bentley J. L., 22-24 January, 1990, Experiments on Traveling Salesman Heuristics,
In Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, San
Francisco, CA, USA, Vol. 91-99
Clarke G., Wright J. W., 1964, Scheduling of Vehicles from a Central Customer to a
Number of Delivery Points, Oper. Res., Vol. 12, pp. 568-581
Mele U. J., Gambardella L. M., Montemanni R., 2021, Machine Learning Approaches for
the Traveling Salesman Problem: A Survey, Proceedings of the 8$^{th}$ International
Conference on Industrial Engineering and Applications 2021 (Europe), pp. 182-186
Zhang R., Prokhorchuk A., Dauwels J., 2020, Deep Reinforcement Learning for Traveling
Salesman Problem with Time Windows and Rejections, 2020 International Joint Conference
on Neural Networks (IJCNN), pp. 1-8
Reinelt G., 1991, TSPLIB-A Traveling Salesman Problem Library, INFORMS Journal on
Computing, Vol. 3, No. 4, pp. 376-384
Usberti F. L., Franca P. M., France A. L. M., 2011, The Open Capacitated Arc Routing
Problem, Computers & Operations Research, Vol. 38, No. 11, pp. 1543-1555
Minieka E., 1979, The Chinese Postman Problem for Mixed Networks, Management Science,
Vol. 25, No. 7, pp. 609-707
Author
Navin Ranjan received his Bachelor's degree from Kathmandu University, Nepal, in
2016 and his Master's degree from Incheon National University, South Korea, in 2021.
He worked as a Research Assistant at Industry-University Cooperation Foundation, Incheon
National University, South Korea (2021~2022). His research interests include image
processing, computer vision, scene understanding, and reinforcement learning.
Sovit Bhandari received a B.E. degree from Kathmandu University, Nepal, and an
M.S. degree from Incheon National University, South Korea, in 2016 and 2020, respectively.
He worked as Core Network Engineer at Huawei Technologies Nepal Pvt. Ltd. From Jan.
2018 to Aug. 2018, and since Sept. 2020, he has been working as a Research Assistant
at IoT and Big-Data Research Center, Incheon National University, South Korea. His
research interests include radio resource management, 5G and beyond mobile communications,
machine learning, the Internet of Things, image processing, and computer vision.
Yeong-Chan Kim received his Bachelor’s degree from Daelim University, South Korea,
in 2021. He is currently doing his M.S. at Incheon National University, South Korea.
His research interests includes data science, deep learning, next-generation mobile
communication systems, image processing, and computer vision.
Hoon Kim has been working as an Associate Professor at the Department of Electronics
Engineering, Incheon National University, South Korea, since 2008. His research interest
include radio resource management, optimization techniques, 5G mobile communication
systems, machine learning and the internet of things. He is a Member of KICS, IEIE,
IEEE, and IEICE.