Mobile QR Code QR CODE

2024

Acceptance Ratio

21%


  1. (School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou, 730070, China zhaotingting163@hotmail.com)



Intelligent congestion prediction, Knowledge graph technology, Urban road traffic

1. Introduction

Urban traffic congestion, exacerbated by globalization and urbanization, strains city development. Rapid urban population growth and motor vehicle proliferation outpace road infrastructure expansion, overloading urban transport systems. In 2019, U.S. commuters lost 88 billion hours to congestion, impacting economic vitality and national growth potential [1]. Traffic is a major source of carbon emissions, contributing 16% of global greenhouse gases, with congestion increasing idling emissions. In Beijing, vehicle emissions account for 30-50% of PM2.5 on polluted days. Health risks include heightened stress, discomfort, and increased disease prevalence linked to traffic noise and pollution [2]. London residents near main roads suffer a 20% higher rate of cardiovascular issues. Socially, congestion in cities like Mumbai adds 40% extra commute time, eroding personal life and productivity. Sao Paulo's peak hour speeds drop below bicycle pace, straining public transport and urban mobility. These issues highlight the urgent need for efficient urban planning and sustainable transport solutions [3-4].

In recent years, with the rise of big data technology and the rapid development of artificial intelligence algorithms, urban traffic congestion prediction research has entered a new stage. Scholars at home and abroad have explored the use of advanced information technology means to improve the accuracy and real-time traffic state prediction, urban traffic management decision-making to provide scientific basis. In terms of big data, researchers integrate multi-source data (such as floating car data, GPS data, traffic surveillance video, social media information, etc.), and use big data processing technology to conduct in-depth mining and analysis to reveal the inherent laws and abnormal behaviors of traffic flow. These studies have effectively improved the data processing capacity and provided richer and more accurate input information for the prediction model, as shown in Fig. 1. From the perspective of urban density, the density of new first-tier cities is shown in Fig. 1.

Fig. 1. City density ranking.

../../Resources/ieie/IEIESPC.2025.14.5.692/fig1.png

In the fast-evolving urban landscape, traditional traffic management and prediction models struggle, hindered by their reliance on linear extrapolation from limited historical data. This oversight neglects the complex, dynamic nature of traffic networks, including unpredictable events, weather impacts, and seasonal variations, leading to diminished forecasting accuracy and sluggish responsiveness. Confronted with vast, high-dimensional traffic datasets, conventional methodologies fail to discern nuanced patterns and interconnections within traffic flows, compromising real-time, precise state predictions and thus, traffic management efficacy. To address these limitations, we propose an intelligent prediction approach leveraging knowledge graph technology. Acting as a robust data integration and analysis instrument, knowledge graphs adeptly handle multi-source, heterogeneous data---encompassing GIS data, social media insights, weather forecasts, and historical traffic records. This facilitates the construction of a comprehensive, dynamic urban traffic network framework. Employing deep learning algorithms, our methodology autonomously uncovers intricate data correlations, enabling accurate congestion predictions. The immediate goal is to bridge gaps in current traffic forecasting technologies, offering a smarter, more adaptable solution. Long-term aspirations involve advancing smart city initiatives through the dissemination and application of this novel technique. Smart transportation, a cornerstone of smart cities, strives to enhance urban traffic system efficiency, alleviate congestion, and improve air quality, thereby uplifting residents' quality of life. Our research empowers traffic authorities with decision-making support for optimizing signal controls, road planning, and emergency response strategies, while concurrently informing the public about real-time traffic conditions to facilitate informed travel choices, minimize wait times, and boost traveler satisfaction [5-6].

The core contents of this study include the following aspects: (1) This study will try to introduce knowledge map technology into urban transportation field for the first time, and construct a comprehensive knowledge map including traffic network structure, historical traffic flow, weather conditions, social events, holidays and other factors. It can not only realize the semantic expression of traffic information, but also enhance the deep understanding of the causes of traffic congestion through the relationship reasoning between entities. (2) Integrate all kinds of traffic-related data sources, including but not limited to traffic sensor data, satellite remote sensing data, social media data, etc, and adopt advanced data fusion technology to ensure the real-time and integrity of knowledge graph and provide richer input information for prediction model. (3) Based on the knowledge graph constructed, a new graph neural network (GNN) model is designed and implemented to capture the spatial dependence and dynamic characteristics of time series of traffic networks. The model will make full use of entity relationships in knowledge graphs to improve the prediction accuracy and generalization ability of the model [7].

The core objectives of this study are focused on three main areas: The first task is to construct a highly integrated knowledge map aimed at mapping the complex characteristics of urban transportation systems in all aspects. This knowledge map will deeply integrate multiple information such as traffic network structure, historical traffic flow patterns, meteorological conditions, special events and social and economic activities, providing an unprecedented and deeply integrated data support platform for traffic congestion prediction. Through this platform, we aim to break down data silos, facilitate efficient flow of information and cross-validation, and gain insight into the multiple drivers behind traffic congestion.

2. Literature Review

2.1. Knowledge Graph Concept

As a structured form of knowledge representation, knowledge graph has rapidly become a core component in information retrieval, natural language processing, recommendation systems and other fields since Google first published its ``knowledge graph'' project in 2012. Simply put, a knowledge graph is a network structure composed of entities, relationships, and attributes to describe various entities and their relationships in the real world. Mathematically, the knowledge graph can be represented as a set of triples $G=(E, R, F)$, where $E$ is the set of entities, $R$ is the set of relationships, and $F$ is the set of facts consisting of entities and relationships, i.e, triples, representing entities and entities connected by relationships [8-9].

2.2. Knowledge Graph Construction Method

Knowledge graph construction is a process involving information extraction, knowledge representation and storage, mainly including the following steps:

Entity recognition is the first step in the construction of knowledge graph, which aims to automatically recognize specific types of entities from text. Common methods include rule-based methods, dictionary matching, and machine learning models (e.g. CRF, BiLSTM-CRF). The formula represents a basic Named Entity Recognition (NER) task, assuming $x=(\omega_{1}$, $\omega_{2}$, .., $\omega_{n})$ that the input word sequence is $y=(y_{1}$, $y_{2}$, .., $y_{n})$ the corresponding label sequence, then the goal of the model ${f(x)}$ is to maximize $P(y{\mid}x)$, as shown in Eq. (1) [10].

(1)
$ \hat{y}=\arg \max _{y} P(y\mid x)=\arg \max _{y} f(x). $

Relationship extraction aims at extracting relationships between entities from texts, and is one of the key steps in constructing knowledge graphs. Common techniques include template-based methods, supervised learning methods (e.g. SVM, logistic regression), and deep learning models (e.g. CNN, RNN, BERT). Taking RE based on deep learning as an example, considering two entities and the context c between them, the model aims to predict the relationship type r between them, as shown in Eq. (2) [11].

(2)
$ \hat{r}=\arg \max _{r} g(e_{1} ,e_{2} ,c) . $

Ontology design is the skeleton of knowledge graph, which defines entity types, relationship types and their constraints. It usually follows standards such as web ontology language (OWL) and is formally described using description logic (DL). For example, a simple ontology definition can be as shown in Eq. (3). Indicates that all instances of the Person class must have an associated hasAge attribute with an integer value [12].

(3)
$ {\rm Class}(Person)\sqsubseteq \exists \text{hasAge.Integer}. $

2.3. Brief Introduction of Related Art

Machine learning techniques, especially supervised learning, are widely used in entity recognition and relationship extraction from knowledge graphs. By training large amounts of labeled data, the model learns how to accurately identify and classify entities and relationships from text. For example, when SVM is used for relational classification, the objective function of the model may be to minimize the following Hinge loss [13], as shown in Eq. (4).

(4)
$ L=\sum _{i=1}^{N}\max (0,~1y_{i} (w^{T} x_{i} +b)) . $

Deep learning, especially deep neural networks (DNNs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs), has demonstrated a strong ability to process large-scale unstructured data and extract high-level features. In the domain of knowledge graphs, these techniques are used for more complex semantic understanding and relational reasoning.

Graph convolutional network (GCN) can be used for knowledge graph embedding, as shown in Eq. (5) [14].

(5)
$ h_{i}^{(l+1)} =\sigma \left(\sum _{j\in {\mathcal N}_{i} }\frac{1}{c_{ij} } W^{(l)} h_{j}^{(l)} \right) $

Eq. (5) is a mathematical equation representing the layers of a graph convolutional network (GCN). In this equation: $h_{i}^{(l+1)} $ denotes the eigenvector or representation of node iat level $l+1$. $\sigma(\cdot )$ is the activation function, usually ReLU or sigma function is selected. $\sum_{j\in N_{i} } h_{j}^{(l)} $ is the sum of eigenvectors of node $i$'s neighbors $j$, where $N_{i} $ represents node $i$'s set of neighbors. $W^{(l)} $ is the weight matrix used to update the eigenvector of node $i$. $c_{ij} $ is an optional normalization factor used to ensure that adjacent node sets of different sizes have the same weight. Common normalization methods are L2 normalization and degree normalization.

Graph neural networks (GNN) are deep learning models designed for graph data that learn representations of nodes and edges while preserving graph structure information. In knowledge graphs, GNN is often used for tasks such as link prediction and node classification. A simple GNN layer propagation rule is as follows, as shown in Eq. (6) [15], where $A$ is the adjacency matrix, reflecting the connection structure in the knowledge graph.

(6)
$ h_{i}^{(l+1)} =\sigma \left(\sum _{j\in {{\mathcal N}}_{i} }A_{ij} W^{(l)} h_{j}^{(l)} \right). $

Eq. (6) is a mathematical equation representing the graph neural network (GNN) layers. In this equation: $h_{i}^{(l+1)} $ denotes the eigenvector or representation of node $i$ at level $(l+1)$. $\sigma(\cdot )$ is the activation function, usually ReLU or sigma function is selected. $\sum_{j\in N_{i} } A_{ij} W^{(l)} h_{j}^{(l)} $ is the sum of eigenvectors of neighbor nodes $j$ of node $i$, where $A_{ij} $ is an element of adjacency matrix indicating whether node $i$ and node $j$ have a connection relationship, if yes, 1, otherwise, 0; $N_{i} $ represents the set of adjacent nodes of node $i$; $W^{(l)} $ is the weight matrix used to update the eigenvector of node $i$. $h_{j}^{(l)} $ is the eigenvector of node $j$ at level $l$.

2.4. Research Status

Urban traffic congestion is a multifaceted challenge with causes including peak hour demand imbalances, suboptimal road network design, and increasing private car usage. Ranjan et al. [17-18] note that inadequate traffic evacuation channels and private car dependence hinder public transport efficiency. Factors like urban planning, economic growth, and policy also play roles, with Guo et al. [6] highlighting urban sprawl and chaotic land use. Short-term policies like parking fees can mitigate demand but lack long-term impact [19]. Traditional traffic flow theory models, such as the Fundamental Diagram [20], are limited in handling traffic system complexities and often overlook the significance of real-time data, leading to poor responsiveness to unexpected events [21].

3. Construction of Urban Traffic Model Based on Knowledge Graph

3.1. Data Sources and Pre-processing

To build a knowledge graph-based urban traffic model, we first need to integrate high-quality data from different channels. Data sources include:

Traffic data: From traffic sensors, GPS tracking devices, floating car data, bus and taxi data, etc, to provide real-time and historical traffic flow, speed, density and other information. These data need to be cleaned, outliers removed, missing values filled (e.g. using mean, median, or interpolation), and normalized, e.g. traffic flow normalized to vehicles per hour (VPH), as shown in Eq. (7) [22].

(7)
$ \text{normalized flow}=\frac{\text{original flowflow minimum}}{\text{flow maximumflow minimum}}. $

Weather data: Including temperature, humidity, wind speed, rainfall, etc, which can be obtained from weather stations. Attention shall be paid to time synchronization during preprocessing to ensure that weather data and traffic data correspond to the same time interval, and interpolation shall be performed if necessary to maintain continuity, as shown in Formula 8 [23]. Where $(t_{1}$, $T)$ and $(,)$ are the time and temperature values of two adjacent time points respectively, and $t$ is the target interpolation time.

(8)
$ \frac{({t_2}t)^2} \times {T_1} + {(t{t_1})^2} \times {T_2}{({t_2}{t_1})^2}. $

Event data: Covering traffic accidents, construction, holidays, large-scale events, etc, which can be obtained through social media mining, government announcements, news reports and other channels. Pre-processing includes event classification, time location and impact range definition to ensure uniform data format.

3.2. Traffic Knowledge Map Design

Knowledge graph construction is the core of the whole model, including entity definition, relationship definition, ontology design and graph example display.

Entity definition: Including Road, Intersection, Zone, Vehicle, Weather, Event, etc.

Relationship definitions: Such as ``belongs_to'', ``connects_to'', ``affects'', ``occurs_at'', etc. The directionality and attributes of the relationship shall be clearly defined. For example, the relationship of ``influence'' can be attached with influence degree (mild, moderate and severe), as shown in Eq. (9) [24].

(9)
$ \text{Relationship instance}=\nonumber\\ \text{(Entity$_{1}$, Relationship, Entity$_{2}$, Attribute)}. $

Ontology construction: Define the classification system of entities and relationships, such as defining traffic flow as Traffic Volume class, including average speed (avg_speed), vehicle number (vehicle_count) and other attributes. Ontology design ensures semantic consistency of data, as shown in Eq. (10) [25].

(10)
$ \text{Traffic Volume} \sqsubseteq \nonumber\\ \{\text{avg}_{\rm s} {\rm peed}:{\rm Float},~{\rm vehicle}_{{\rm c}} {\rm ount}:{\rm Integer}\}. $

Graph instance display: Display the connection between entities through graphical interface, such as displaying that a certain road (entity ID: R001) belongs to a certain area (entity ID: Z001) and is affected by a traffic accident (entity ID: E012) at a certain time point.

3.3. Atlas Fusion and Dynamic Update Mechanism

Multi-source data integration: Mapping the processed traffic data, weather data and event data to entities and relationships of the knowledge graph through entity recognition and relationship extraction. For example, map a traffic flow record to a Traffic Volume entity in the graph, and its attributes correspond to the original data fields [26]. The update mechanism of the knowledge graph is specifically shown in Fig. 2. This is shown in Eq. (11)

(11)
$ {\rm TrafficVolume}(R001,t_{0} )=\nonumber\\ ({\rm avg}_{{\rm s}} {\rm peed}:{\rm 30~km/h},{\rm vehicle}_{{\rm c}} {\rm ount}:{\rm 1200}). $

Fig. 2. Update mechanism of knowledge graph.

../../Resources/ieie/IEIESPC.2025.14.5.692/fig2.png

Real-time data access strategy: In order to maintain the timeliness of the graph, it is necessary to establish a data flow processing mechanism, such as using Apache Kafka and other message queue technologies to receive real-time data and trigger dynamic updates of the graph. Real-time data is first subjected to lightweight preprocessing to ensure correct data format, and then graph insertion, deletion or update operations are performed according to changes in entities and relationships, as shown in Eq. (12) [27].

(12)
$ {\rm UpdateRule}={\rm IF\; New\; Data} \nonumber \\ \Rightarrow{\rm Find\; corresponding\; entity}\nonumber \\ \Rightarrow({\rm Insert/Update/DeleteEntity\; or\; Relationship}). $

Dynamic updating mechanisms also include periodic maintenance of the graph, such as periodic checking of data consistency, handling of data conflicts, updating the ontology to accommodate new data types or changing needs, and ensuring that the knowledge graph continues to accurately reflect the latest state of the urban transportation system [28-29].

To sum up, urban traffic model construction based on knowledge graph is a comprehensive process involving data integration, knowledge representation, real-time processing and dynamic maintenance. Through elaborate entity and relationship system, efficient multi-source data fusion strategy and flexible dynamic update mechanism, the model can provide a comprehensive, real-time and semantic-rich data support platform for traffic congestion prediction.

4. Knowledge Graph Driven Traffic Congestion Prediction Model

4.1. Model Frame Design

In this chapter, we will introduce in detail the design of traffic congestion prediction model based on knowledge graph and graph neural network (GNN). The model aims to achieve accurate prediction of urban traffic congestion by integrating rich semantic information provided by knowledge graphs and deep learning capabilities of graph neural networks. The model architecture is divided into four main parts: data preparation, knowledge graph embedding, graph neural network modeling, and prediction layer [30].

In the data preparation stage, the preprocessed traffic data, weather data, event data, etc. are transformed into entities and relationships in the knowledge atlas to form an initial atlas structure.

Knowledge graph embedding maps entities and relationships into a low-dimensional vector space through methods such as TransE, so that structural information in the knowledge graph can be encoded [31]. TransE models learn embedding through formulas, where $h$ and $t$ are vector representations of head entities and tail entities, $r$ is vector representation of relationships, and $d$ is a distance function, such as Euclidean distance or cosine distance.

Graph neural network modeling is the core of the model. It uses the embedded nodes and edge features of knowledge graph to learn the context representation of nodes in graph through information transfer mechanism. For example, the information aggregation formula of graph convolutional network (GCN) is Eq. (13). Where is the eigenvector of the ith node at the lth layer, is the neighbor set of node $i$, is the normalization coefficient, is the inter-layer weight matrix, and is the activation function.

(13)
$ h_{i}^{(l+1)} =\sigma \left(\sum _{j\in \mathrm{{\mathcal N}}_{i} }\frac{1}{c_{ij} } W^{(l)} h_{j}^{(l)} \right) . $

Eq. (13) is a mathematical equation representing a prediction layer. In this equation: $h_{i}^{(l+1)} $ denotes the eigenvector or representation of node $i$ at level $(l+1)$. $\sigma(\cdot )$ is the activation function, usually ReLU or sigma function is selected. $\sum_{j\in N_{i} } \frac{1}{c_{ij} } W^{(l)} h_{j}^{(l)} $ is the sum of eigenvectors of node $i$'s neighbors $j$, where $N_{i} $ represents the neighbor node set of node $i$; $c_{ij} $ is an optional normalization factor to ensure that neighbor node sets of different sizes have the same weight; $W^{(l)} $ is the weight matrix used to update the eigenvector of node $i$. $h_{j}^{(l)} $ is the eigenvector of node $j$ at level $l$.

The prediction layer outputs the final traffic congestion state prediction based on the learned node representation through the fully connected layer or other prediction models (such as LSTM for time series prediction).

4.2. Feature Selection and Representation Learning

Feature selection is crucial in a model, it determines the learning ability and generalization performance of the model. Feature selection based on knowledge graphs should consider the following aspects:

Traffic flow characteristics: Including historical average speed, flow count, congestion duration, etc.

Weather characteristics: Temperature, humidity, precipitation, etc, which have an indirect effect on traffic conditions.

Road network structure characteristics: Such as road length, number of intersections, reflecting physical attributes.

In representation learning, GNN can effectively fuse the above features. For example, through R-GCN (Relational Graph Convolutional Network), different transformation matrices can be designed for each entity and relationship to capture different influences of features under different relationships, specifically Eq. (14).

(14)
$ h_{i}^{(l+1)} =\sigma \left(\sum _{r\in {\mathcal R}}\sum _{j\in {\mathcal N}_{i}^{r} }\frac{1}{c_{ij}^{r} } W_{r}^{(l)} h_{j}^{(l)} \right), $

where $\mathcal R$ is the set of relationships, ${\mathcal N}_{i}^r$ is the set of neighbors connected to node $i$ by relationship $r$, and is the weight matrix for relationship $r$.

4.3. Prediction Algorithm Design and Optimization

The core of prediction algorithm design is to predict traffic congestion state by using learned node representation. Considering that traffic congestion is a time-series problem, we can combine graph neural network with recurrent neural network (RNN) or long-short term memory network (LSTM) to form space-time graph attention network (ST-GAT-LSTM) model. The model structure is as follows:

(1) Spatio-temporal feature fusion: Firstly, GNN is used to extract spatio-temporal features from knowledge graph, and node representation of each time step is obtained.

(2) Attention mechanism: In ST-GAT layer, attention mechanism is used to assign different weights to different nodes to emphasize the influence of important nodes. The attention score is calculated by Eq. (15), where a is the activation function and W is the weight matrix.

(15)
$ \alpha _{ij} ={\rm softmax}_{j} (a(W_{q} h_{i}^{T} Wh_{j} )) . $

Eq. (15) is a mathematical equation that represents the mechanism of attention. In this equation: $\alpha _{ij} $ represents the attention weight of node $i$ to node $j$, which reflects the degree of attention of node $i$ to node $j$. $softmax(\cdot )$ is a normalization function that guarantees that the sum of all the attention weights is 1. $a(\cdot )$ is a non-linear activation function, and the Sigmoid function is usually chosen. $W_{q} $ and $W_{h} $ are the weight matrices used to calculate the attention weights. $h_{i} $ and $h_{j} $ are the eigenvectors of node $i$ and node $j$, respectively.

(3) LSTM prediction: input weighted node features into LSTM unit, learn long-term dependence relationship, and finally output traffic congestion state prediction. The update formula of LSTM unit includes the calculation of forgetting gate, input gate, cell state update and output gate.

The optimization of the model mainly focuses on the design of loss function and parameter optimization. Typically choosing mean squared error (MSE) as the loss function, our model uses Bayesian optimization to perform parameter tuning to ensure the model achieves optimal performance, as shown in Eq. (16)

(16)
$ {\rm Loss}=\frac{1}{N} \sum _{i=1}^{N}(y_{i} \widehat{y_{i} })^{2} . $

The specific prediction flow is shown in Fig. 3. Through carefully designed model framework, feature selection and representation learning, and optimized prediction algorithm, the traffic congestion prediction model based on knowledge graph can fully mine the complex spatiotemporal features of traffic system, realize accurate prediction of traffic congestion, and provide powerful decision support for urban traffic management.

Fig. 3. Prediction flow.

../../Resources/ieie/IEIESPC.2025.14.5.692/fig3.png

5. Experimental Design and Results Analysis

5.1. Experimental Data Set Description

The data set used in this study is derived from comprehensive traffic information records for a metropolitan area throughout the year, from the end of 2022 to the end of 2023, totaling 16500 records, accurate to 24 hours of traffic flow details per day. To fully reflect seasonal effects, the data cover four seasons: spring, summer, autumn and winter. The research focuses on the core areas of cities. Four representative areas are selected: urban center area, prosperous commercial area, dense residential area and busy industrial area. Ten key intersection monitoring points will be set up in each area, totaling 40 monitoring points to collect traffic flow. By integrating the number of vehicles per unit time, average speed, road occupancy rate, weather parameters (such as temperature, humidity, rainfall) and special event information (including traffic accident records and road construction conditions), a multidimensional high-density dataset is constructed to deeply mine traffic behavior patterns and accurately predict congestion trends.In order to deploy the intelligent traffic congestion prediction model effectively, the system configuration needs to meet certain hardware and software requirements. In terms of hardware, at least high-performance CPU is equipped. GPU is recommended to accelerate deep learning operation. Memory is not less than 16GB. Storage space should be sufficient to accommodate massive data sets and model files. It is recommended to use a stable Linux distribution such as Ubuntu or CentOS for scientific computing and resource management. The software environment should include Python development environment, version 3.7 and above, and deep learning frameworks such as TensorFlow and PyTorch for model training and reasoning. In addition, the necessary data processing and visualization libraries such as Pandas, NumPy and Matplotlib need to be installed. Considering that the model may run online in real time, the server needs to have a stable network connection to support real-time transmission of data and continuous updating of the model. Finally, the system should have good security measures to ensure data privacy and network security.

5.2. Result Analysis

In the evaluation system of this study, we adopted the following core indicators to comprehensively measure the performance of the model. Mean Absolute Percent Error (MAPE) This is an indicator used to assess the relative error between predictions and actual observations. It is calculated by averaging the absolute value of all prediction errors and dividing by the total number of observations. Mean square error (MSE) measures the mean of the sum of squares of the differences between predicted and actual values, expressed as, and reflects the overall magnitude of the prediction error. The R$^2$ score, also known as the coefficient of determination, reflects the proportion of variability explained by the model and is calculated as, close to 1 indicates that the model fits well and can explain data variability well.

Table 1 shows the basic information of traffic congestion prediction model, including model name, model type, training data source and prediction target. The model name is ``Traffic Congestion Prediction Model Based on Knowledge Graph'', the model type is ``Machine Learning Model'', and the training data sources include historical traffic flow data, road condition data and meteorological data. The prediction target is the traffic congestion situation in a period of time in the future, including congestion time and congestion degree.

Table 1. Comparison of MAPE, MSE and R$^{2}$ scores of each model.

Model

MAPE (%)

MSE

R2 score

ARIMA

12.3

0.87

0.78

SVM

9.5

0.65

0.72

GCN+LSTM

7.2

0.52

0.76

KG-GNN

5.8

0.41

0.83

Table 2. F1 Scores (for congestion classification tasks).

Model

F1 score

SVM

0.78

GCN+LSTM

0.81

KG-GNN

0.85

Note: The smaller the standard deviation, the smaller the fluctuation of the model prediction results and the higher the stability. The deviation range reflects the fluctuation of model prediction performance in different time periods or under different conditions. The narrower the range, the more stable the model is.

Table 2 presents training and validation results with accuracy at 90% and 85% respectively. Table 3 evaluates the model's performance in traffic prediction, control, and public transit optimization, highlighting its ability to predict traffic congestion 30 minutes ahead. Table 4 assesses the~model's impact, showing a decrease in peak congestion time from 15.6 to 12.8 minutes (-18.9%), a reduction in the average delay index from 1.3 to 1.1 (-15.4%), and a rise in public satisfaction from 3.2 to 4.0 (+25.0%) based on a 10,000-sample survey.

Table 3. Stability analysis of model prediction results.

Model equation

Standard deviation (%)

Deviation Range (Minimum to Maximum MAPE %)

Stability evaluation

ARIMA

2.1

10.214.1

medium

SVM

2.8

6.712.1

less stable

GCN+LSTM

1.5

5.78.8

stable

KG-GNN

1.2

4.96.9

very stable

Table 4. Comparison of model calculation efficiency.

Model

Average training time (minutes)

Average prediction time (seconds)

Overall efficiency evaluation

ARIMA

5

<1

high

SVM

15

2

medium

GCN+LSTM

30

5

medium

KG-GNN

45

10

lower

Note: Computational efficiency includes the time required to train the model and the execution time of a single prediction. The average training time reflects the complexity and convergence speed of the model, and the prediction time directly affects the feasibility of the model in real-time application. The overall efficiency evaluation combines the time cost of training and prediction, and fast prediction time is especially important for real-time prediction systems.

5.3. Application Cases and Effect Evaluation

In Eastern China's ``Lanhai City,'' known for dense traffic and complex road networks, our model is applied specifically in the CBD and on Chaoyang Road, Bibo Avenue, and Xingqiao Road. Integrating data processing, graph construction, model deployment, and application, we summarize yearly traffic, weather, and incident data. Using Neo4j, a composite knowledge graph is built, encompassing infrastructure, events, and environmental factors. The KG-GNN model, hosted on a cloud server, extracts real-time data hourly for predictive analysis, feeding forecasts of congestion probability, severity, and causes back to the traffic management center for the next two hours.

Policy formulation: Based on the model prediction, the Lanhai Municipal Government adjusted the bus routes and increased the number of peak hours, which promoted the sharing rate of public transportation by about 10% and eased the dependence on private cars.

Public service: Through real-time traffic push, the public can plan travel routes in advance, reduce the probability of encountering congestion and improve travel efficiency. According to statistics, APP active users increased by 25%, and users saved an average of 5 minutes in travel time.

Fig. 4 shows a summary of user feedback on the application of the model, including comments on prediction accuracy, usefulness, and functional recommendations. In terms of prediction accuracy, 78% of users believe that predictions are consistent with reality, 16% believe that occasional deviations are large, and 6% believe that frequent inaccuracies. In terms of practicality, 85% of users think the model is very practical and effective in guiding travel, 12% think it is helpful but needs improvement, and 3% think it is less used and the perception is not obvious. In terms of function suggestions, 30% of users suggested strengthening navigation linkage, 25% suggested providing more travel alternatives, and 15% suggested adding voice reminders.

Fig. 4. Comparison of traffic efficiency improvement before and after model application.

../../Resources/ieie/IEIESPC.2025.14.5.692/fig4.png

Note: Average congestion time is calculated at peak hours of the weekday; delay index reflects the ratio of actual travel time to free flow time; public satisfaction survey collected 10000 samples through online questionnaire.

Table 5 lists suggestions for future optimization directions, including improvements in extreme event response, multimodal fusion, and user experience. According to user feedback, it is suggested to optimize the prediction ability of the model under severe weather or unexpected events, and add specific event modules. At the same time, it is suggested to strengthen the integration with navigation system and public transport information to provide comprehensive travel suggestions. In addition, it is suggested to develop voice reminder function to improve safety and convenience while driving and increase user interaction. Through continuous feedback loop, the model function will be further improved and its application value in urban traffic management will be expanded.

Table 5. Summary of user feedback.

Category

Feedback content

Proportion

Predictive accuracy

High (forecast agrees with actual situation)

78%

Fair (occasionally large deviation)

16%

Low (frequently inaccurate)

6%

Practicability

Very practical, effective in guiding travel

85%

It helps, but needs improvement.

12%

Less used, less perceived

3%

Feature recommendations

Strengthen navigation linkage

30%

More travel alternatives

25%

Added voice reminder function

15%

Table 6. Future optimization directions driven by user feedback.

Optimization Direction

Target

Extreme Event Response Enhancement

Reduce forecast deviations in extreme conditions

Multimodal Information Integration

Expand service scope with integrated travel options

User Experience Improvement

Enhance driving safety and operational convenience

Data Timeliness and Accuracy

Ensure predictions reflect real-time traffic

Social Media Synergy

Detect early incident signals

Cross-Platform Compatibility

Broaden the user base through device optimization

As shown in Table 6, the feedback collected from users has been instrumental in identifying areas where the model can be improved to better serve the community. The enhancement of extreme event response aims to make the model more robust against unforeseen circumstances, ensuring that predictions remain reliable even under challenging conditions. By integrating multimodal information, the system can cater to a wider audience, offering tailored travel suggestions that consider various modes of transportation. Improvements in user experience through the introduction of voice alerts and a refined interface will contribute to safer and more intuitive interactions. Regular calibration and the incorporation of real-time data are crucial for maintaining the relevance and precision of predictions. Analyzing social media data allows the system to detect potential issues before they escalate, providing timely warnings to users. Lastly, ensuring cross-platform compatibility will help reach a broader audience, making the application accessible to more people regardless of their device choice. These enhancements, driven by user feedback, will collectively elevate the model's performance and utility in urban traffic management.

6. Conclusion

In this study, an intelligent traffic congestion prediction model based on knowledge graph and graph neural network is successfully developed, which realizes deep understanding and accurate prediction of urban traffic flow. Through the integration and fine preprocessing of multi-source data, the knowledge graph constructed not only contains rich traffic information, but also integrates weather and event factors, forming a highly semantic data expression framework. In the aspect of model design, the combination of GNN and LSTM effectively fuses static road network structure information and dynamic traffic flow variability, especially the use of knowledge graph embedded learning entity representation, which significantly enhances the spatiotemporal feature capture ability of the model. Experimental results show that compared with traditional models, KG-GNN model significantly improves prediction accuracy, stability and model interpretability, especially in reducing average congestion time and improving traffic smoothness, which brings substantial improvement to urban traffic management. In the actual application case, the deployment of Lanhai City verified the effectiveness of the model, which not only reduced the congestion time and reduced the delay index, but also showed the contribution of the model in improving the public travel experience through the improvement of public satisfaction. In addition, follow-up optimization suggestions based on user feedback, such as enhancing extreme event response, multimodal data fusion and improving user experience, point the way for continuous iteration and development of the model. To sum up, this study not only provides advanced technical means for urban traffic congestion prediction, but also lays a solid foundation for the traffic management decision support system of future smart cities, which has important theoretical significance and practical value.

REFERENCES

1 
J. B. Chen, D. M. Li, G. L. Zhang, and X. L. Zhang, ``Localized space-time autoregressive parameters estimation for traffic flow prediction in urban road networks,'' Applied Sciences, vol. 8, no. 2, 20, 2018.DOI
2 
Z. Chen, Y. Jiang, and D. H. Sun, ``Discrimination and prediction of traffic congestion states of urban road network based on spatio-temporal correlation,'' IEEE Access, vol. 8, pp. 3330-3342, 2020.DOI
3 
W. Elleuch, A. Wali, and A. M. Alimi, ``Neural congestion prediction system for trip modelling in heterogeneous spatio-temporal patterns,'' International Journal of Systems Science, vol. 51, no. 8, pp. 1373-1391, 2020.DOI
4 
R. A. A. Khalil, ``Building the public transportation system in Libya,'' Engineering Heritage Journal, vol. 8, no. 1, pp. 7-12, 2024.DOI
5 
R, Feng, H. Q. Cui, Q. Feng, S. X. Chen, X. N. Gu, and B. Z. Yao, ``Urban traffic congestion level prediction using a fusion-based graph convolutional network,'' IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 12, pp. 14695-14705, 2023.DOI
6 
Y. J. Guo, L. C. Yang, S. X. Hao, and J. Gao, ``Dynamic identification of urban traffic congestion warning communities in heterogeneous networks,'' Physica A - Statistical Mechanics and Its Applications, vol. 522, pp. 98-111, 2019DOI
7 
W. B. Hu, H. Wang, Z. Y. Qiu, L. P. Yan, C. Nie, and B. Du, ``An urban traffic simulation model for traffic congestion predicting and avoiding,'' Neural Computing & Applications, vol. 30, no. 6, pp. 1769-1781, 2018.DOI
8 
D. R, Huang, Z. P. Deng, S. H. Wan, B. Mi, and Y. Liu, ``Identification and prediction of urban traffic congestion via cyber-physical link optimization,'' IEEE Access, vol. 6, pp. 63268-63278, 2018.DOI
9 
R, Jia, P. C. Jiang, L. Liu, L. Z. Cui, and Y. L. Shi, ``Data driven congestion trends prediction of urban transportation,'' IEEE Internet of Things Journal, vol. 5, no. 2, pp. 581-591, 2018.DOI
10 
U, Jilani, M. Asif, M. Y. I. Zia, M. Rashid, S. Shams, and P. Otero, ``A systematic review on urban road traffic congestion,'' Wireless Personal Communications, vol. 140, pp. 81-109, 2025.DOI
11 
M. Q, Lv, Y. F. Li, T. M. Chen, and Y. L. Li, ``Urban traffic congestion index estimation with open ubiquitous data,'' Journal of Information Science and Engineering, vol. 34, no. 3, pp. 781-799, 2018.DOI
12 
B. Medina-Salgado, E. Sanchez-DelaCruz, P. Pozos-Parra, and J. E. Sierra, ``Urban traffic flow prediction techniques: A review,'' Sustainable Computing-Informatics & Systems, vol. 35, 100739, 2022.DOI
13 
E. E. Mon, H. Ochiai, C. Saivichit, and C. Aswakul, ``Bottleneck based gridlock prediction in an urban road network using long short-term memory,'' Electronics, vol. 9, no. 9, 1412, 2020.DOI
14 
M. Pi, H. Yeon, H. Son, and Y. Jang, ``Visual cause analytics for traffic congestion,'' IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 3, pp. 2186-2201, 2021.DOI
15 
B. Priambodo, A. Ahmad, and R. A. Kadir, ``Predicting traffic flow propagation based on congestion at neighbouring roads using hidden Markov model,'' IEEE Access, vol. 9, pp. 85933-85946, 2021.DOI
16 
K. Ramana, G. Srivastava, M. R. Kumar, T. R. Gadekallu, J. C. W. Lin, M. Alazab, and C. Iwendi, ``A vision transformer approach for traffic congestion prediction in urban areas,'' IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 4, pp. 3922-3934, 2023.DOI
17 
N. Ranjan, S. Bhandari, P. Khan, Y. S. Hong, and H. Kim, ``Large-scale road network congestion pattern analysis and prediction using deep convolutional autoencoder,'' Sustainability, vol. 13, no. 9, 5108, 2021.DOI
18 
S. Ranjan, Y. C. Kim, N. Ranjan, S. Bhandari, and H. Kim, ``Large-scale road network traffic congestion prediction based on recurrent high-resolution network,'' Applied Sciences, vol. 13, no. 9, 5512, 2023.DOI
19 
I. Stan, D. A. Ghere, P. I. Dan, and R. Potolea, ``Urban congestion avoidance methodology based on vehicular traffic thresholding,'' Applied Sciences, vol. 13, no. 4, 2143, 2023.DOI
20 
N. Wang, B. H. Zhang, J. Gu, H. H. Kong, S. Hu, and S. C. Lu, ``Urban road traffic spatiotemporal state estimation based on multivariate phase space-LSTM prediction,'' Applied Sciences, vol. 13, no. 21, 12079, 2023.DOI
21 
X. Wang, R. H. Zeng, F. M. Zou, L. Y. C. Liao, and F. L. Huang, ``STTF: An efficient transformer model for traffic congestion prediction,'' International Journal of Computational Intelligence Systems, vol. 16, 2, 2023.DOI
22 
X. M. Wang, Y. Chen, and J. L. Zhang, ``Urban-road average-speed prediction method based on graph convolutional networks,'' Transportation Research Record, vol. 2678, no. 5, pp. 771-788, 2024.DOI
23 
D. W. Xia, B. Q. Shen, J. Geng, Y. Hu, Y. T. Li, and H. Q. Li, ``Attention-based spatial-temporal adaptive dual-graph convolutional network for traffic flow forecasting,'' Neural Computing & Applications, vol. 35, pp. 17217-17231, 2023.DOI
24 
Z. P. Xie, W. F. Lv, S. F. Huang, Z. L. Lu, B. W. Du, and R. H. Huang, ``Sequential graph neural network for urban road traffic speed prediction,'' IEEE Access, vol. 8, pp. 63349-63358, 2020.DOI
25 
X. Xing and X. Y. Li, ``Recommendation of urban vehicle driving routes under traffic congestion: A traffic congestion regulation method considering road network equilibrium,'' Computers & Electrical Engineering, vol. 110, 108863, 2023.DOI
26 
Y. M. Xing, X. J. Ban, X. Liu, and Q. Shen, ``Large-scale traffic congestion prediction based on the symmetric extreme learning machine cluster fast learning method,'' Symmetry-Basel, vol. 11, no. 6, 730, 2019.DOI
27 
B. Yang, H. Zhang, M. X. Du, A. N. Wang, and K. Xiong, ``Urban traffic congestion alleviation system based on millimeter wave radar and improved probabilistic neural network,'' IET Radar Sonar and Navigation, vol. 18, no. 2, pp. 327-343, 2024.DOI
28 
K. Zhang, Z. X. Chu, J. P. Xing, H. G. Zhang, and Q. X. Cheng, ``Urban traffic flow congestion prediction based on a data-driven model,'' Mathematics, vol. 11, no. 19, 4075, 2023.DOI
29 
T. R. Zhang, J. A. Xu, S. R. Cong, C. S. Qu, and W. B. Zhao, ``A hybrid method of traffic congestion prediction and control,'' IEEE Access, vol. 11, pp. 36471-36491, 2023.DOI
30 
X. Y. Zheng, N. Huang, Y. N. Bai, and X. Zhang, ``A traffic-fractal-element-based congestion model considering the uneven distribution of road traffic,'' Physica A - Statistical Mechanics and Its Applications, vol. 632, no. Part 1, pp. 129354, 2023.DOI
31 
Z. J. Zheng, Z. L. Wang, S. Liu, and W. Ma, ``Exploring The spatial effects on the level of congestion caused by traffic accidents in urban road networks: A case study of Beijing,'' Travel Behaviour and Society, vol. 35, 100728, 2024.DOI

Author

Tingting Zhao
../../Resources/ieie/IEIESPC.2025.14.5.692/au1.png

Tingting Zhao was born in 1981 in Lanzhou, Gansu Province, China. In 2003, she obtained her bachelor's degree in computer science and technology from Lanzhou Jiaotong University. In 2009, she obtained a master's degree in cartography and geographic information systems from Lanzhou Jiaotong University. From 2003 to 2008, she served as a teaching assistant in the Information Management and Information Systems program at the School of Transportation, Lanzhou Jiaotong University. Since 2008, she has been a lecturer in the same program. Her research interests include the application of knowledge graphs and traffic flow prediction. She has published one book and authored over 15 articles.