UllahIhsan1
JamilAnum2
HassanImtiaz Ul3
KimByung-Seo1,*
-
(Department of Software and Communication Engineering, Hongik University, Korea
danish1852@gmail.com, jsnbs@hongik.ac.kr
)
-
(Department of Physics, NED University of Engineering &Technology, Karachi, Pakistan
jamilanum47@gmail.com)
-
(Department of Computer Science and Information Technology, NED University of Engineering
&Technology, Karachi, Pakistan zahidbooni1@gmail.com)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Text mining, Text classification, Sentiment analysis, Supervised machine learning, BERT, GRU, LSTM
1. Introduction
Natural disasters have become frequent worldwide, causing significant destruction
and loss of life. With the rise of social media platforms, particularly Twitter, people
now have an easy and immediate way of sharing information about these disasters. Twitter's
real-time nature enables people to post updates and emergency information about disasters
as they occur. The information shared on Twitter can benefit first responders and
disaster relief organizations because they can quickly assess the situation and allocate
resources accordingly. Studies have shown that social media platforms, such as Twitter,
can provide critical information to help manage natural disasters. One study examined
the role of Twitter in disseminating information during Hurricane Harvey in 2017 [1]. They reported that Twitter was a valuable tool for sharing situational updates and
emergency information, especially in the early stages of a disaster when the traditional
sources of information were limited. Another study analyzed the tweets during the
2017 Mexico earthquake and reported that Twitter users effectively shared information
about missing persons and relief efforts [2].
On the other hand, the vast amount of unstructured and noisy data on Twitter poses
challenges for effective disaster response and management. Various Natural Language
Processing (NLP) techniques have been employed to classify and analyze disaster-related
tweets to address these challenges [3]. These techniques automatically categorize tweets into different types, such as informative,
supportive, and observational, to enable efficient filtering and analysis of disaster-related
information.
Several studies have demonstrated the effectiveness of NLP techniques in classifying
disaster-related tweets, such as the workers who used a deep learning approach to
classify tweets related to the California wildfires [4]. These studies showed promising results in tweet classification tasks using deep
learning models, such as recurrent neural networks (RNNs) and transformers. For example,
using a bi-directional long short-term memory (LSTM) model with an attention mechanism
to classify the tweets related to natural disasters into four categories: casualty,
damage, donations, and sentiment. They achieved an accuracy of 89.7 % and outperformed
several baseline models. Similarly, [5] used a pre-trained bidirectional encoder representations from transformers (BERT)
model to classify tweets related to the COVID-19 pandemic into four categories: news,
opinions, advisories, and miscellaneous.
The present study compared the performance of three different NLP models, namely BERT,
gated recurrent units (GRU), and LSTM, for tweet classification of disaster data.
The proposed studies provide significant contributions to the field of crisis informatics,
particularly in the use of natural language processing (NLP) models for disaster detection
and response. The specific contributions of this research can be encapsulated in the
following points.\begin{enumerate}[1.]
1. This paper presents a unique study comparing three distinct NLP models on disaster-related
tweet classification, a topic previously unexplored. A dataset of 5545 tweets was
manually annotated to assess the strengths and limitations of each model and guide
future research in this domain.
2. This work introduces a robust framework for extracting disaster-relevant information
from Twitter, aiming to enhance the efficiency and depth of disaster management strategies
by interpreting social media data more effectively.
3. This study aimed to develop a mechanism for identifying and categorizing disaster-related
tweets to sift through the vast Twitter data. The goal is to provide real-time updates
and emergency information during natural disasters, enabling stakeholders to gain
immediate insights and respond quickly and effectively.
This research aims to demonstrate the potential of sophisticated NLP techniques in
aiding disaster response and management.
2. Related Work
Natural disasters, such as earthquakes, floods, accidents, and hurricanes, have significant
social and economic impacts on the affected communities. Social media platforms, such
as Twitter, have emerged as valuable sources of real-time information during disasters
[6]. Twitter users often share first-hand accounts, photographs, and videos of the disasters,
as well as requests for help, information, and donations [7]. On the other hand, the vast amount of unstructured and noisy data on Twitter poses
challenges to effective disaster response and management.
In recent years, there has been a growing interest in leveraging social media for
disaster management and response. A previous study developed a system to utilize Twitter
data for coordination in disaster response scenarios [8]. Their study focused on clustering tweets and categorizing them based on their relevance
to disasters. They demonstrated how social media can serve as a real-time source of
disaster-related information.
On the other hand, the task of tweet classification has proven to be a challenge because
of the short, noisy, and unstructured nature of the text. Studies have made into this
problem, examining the use of convolutional neural networks (CNNs) for text classification
[9]. They showed that CNNs can effectively handle the short and sparse nature of tweets,
paving the way for a further exploration of deep learning techniques in this context.
Despite this, research has shown that different deep learning models may perform better
on different tasks. A study examined the performance of several models, including
LSTM and GRU, in the context of sentiment analysis. The research found that GRU models
generally outperformed LSTM models, highlighting the need for further investigation
into the optimal contexts for each model.
Despite the initial impressions, various studies highlight that the effectiveness
of different deep learning models can hinge heavily on the task at hand. For example,
a notable study examined several models, including LSTM and GRU, within the realm
of sentiment analysis [10]. This investigation illuminated the comparative effectiveness of these models, revealing
a general trend of GRU models outperforming their LSTM counterparts. The possible
causes for this difference can be attributed to the unique structural and functional
characteristics of GRUs, including their simplified gating mechanism and lower computational
complexity, which may have advantages in specific scenarios, such as sentiment analysis.
Such nuanced performance disparities underline the criticality of choosing an appropriate
deep learning model based on the distinct requirements and nature of the task. Therefore,
these findings underscore the importance of further detailed, task-specific research
to unearth the optimal model-context pairings, enhancing the knowledge surrounding
comprehensive model evaluations and benchmarking studies across many tasks.
The present study compared the performance of three NLP models, such as BERT, GRU,
and LSTM, for tweet classification of disaster data. To the best of the authors’ knowledge,
no study has compared the performance of these models on disaster-related tweets.
3. The Proposed Scheme
This section describes the dataset used for disaster tweet classification, including
data collection and preprocessing information. The section also presents the proposed
methodology for the classification task on the collected dataset.
3.1 Data Collection
The Tweepy library, a popular Python package, was used for data collection. Tweepy
provides a convenient and easy-to-use interface for accessing the Twitter API. With
Tweepy, researchers could authenticate and establish a connection with the Twitter
platform, enabling them to retrieve tweets based on specific search queries and hashtags
related to disasters. This study used hashtags, such as \#Disaster, \#Earthquake,
\#Floods, \#Accidents, and \#Disasters, to collect many disaster-related tweets for
further analysis and classification. This library streamlined the data collection
process and ensured the inclusion of relevant tweets on different types of disasters.
A dataset of 5545 tweets was collected using Tweepy, providing a diverse and comprehensive
dataset for analysis.
3.2 Data Annotation
The 5545 collected tweets were manually annotated into different disaster categories,
including 'Earthquake', 'Flood', 'Accident', and 'Other disaster'. The annotation
process carefully analyzes the content of each tweet and assigns it to the appropriate
category based on its context and keywords. This manual annotation was carried out
using a team of trained annotators who followed a predefined set of guidelines to
ensure consistency and accuracy in the categorization.
3.3 Data Preprocessing
In the data preprocessing phase, several steps were undertaken to prepare the collected
tweets for further analysis. Initially, common words that do not carry significant
meaning, such as stop words, were removed from the text. This step helped reduce noise
and improve the efficiency of subsequent processes. Furthermore, a technique called
lemmatization was applied to transform words into their base or root form, consolidating
the variations of the same word. This step enhanced the accuracy of the classification
task by reducing the dimensionality of the data and capturing the essence of the tweet
content. The dataset was refined and optimized for subsequent analysis and classification
tasks by performing these preprocessing steps. A team of trained annotators followed
a predefined set of guidelines to ensure consistency and accuracy in the categorization.
3.4 Data Visualization
Data visualization is a powerful tool that provides insights, identifies patterns,
and communicates complex information effectively through visual representations. A
count plot was produced to visualize the distribution of different disaster types
in the dataset (Fig. 1). The data showed that during the collection period, the highest number of incidents
recorded was related to earthquakes, with a count of 2065. This was followed by other
disasters with 1348 occurrences, floods with 1215 occurrences, and accidents with
917 occurrences. The higher count of earthquake incidents can be attributed to the
data being collected when a significant earthquake event occurred in Turkey.
Another type of visualization that can be performed on textual data is wordcloud.
The word cloud produced from a dataset of disaster tweets revealed the prominent terms
associated with different types of disasters. The most frequent terms in the word
cloud, which can be observed in Fig. 2, include "Earthquake," "Hurricane," "Accident," and "Flood Warning." These terms
indicate the prevalence of these specific disaster types in the dataset and highlight
the significance of these events in the context of the analyzed tweets. The word cloud
provides a visual representation that quickly identifies the most commonly mentioned
disaster types in the dataset.
Fig. 1. Tweet counts of each class.
Fig. 2. Word Cloud for all the tweets.
3.5 Data Transformation
This section performs data preparation that includes label encoding, tokenization,
text-to-sequence conversion, and padding to prepare the text data for disaster tweet
classification.
Label encoding is a technique used to convert categorical labels into numerical values.
In disaster tweet classification, the labels ['Accident', 'Earthquake', 'Flood', and
'Other disaster'] were assigned the corresponding numerical labels 0, 1, 2, and 3,
respectively, using the scikit-learn label encoder. This allows the machine learning
model to understand and process the labels effectively.
Tokenization, however, is the process of splitting text into individual words or tokens.
In this case, the vocabulary size was determined to be 14509, meaning there are 14509
unique words in the given disaster tweet dataset. Tokenization is an essential step
in natural language processing tasks because it allows the model to understand and
analyze the text data at a granular level.
After tokenization, the next step involved converting the text data into sequences.
This conversion is necessary to represent each word in the text as a numerical sequence
that machine learning models can process. Each unique word in the vocabulary is assigned
a unique integer value. The conversion of text into sequences allows the model to
understand and analyze the text data numerically.
Following tokenization, a maximum length of 27 words was set as the longest length
of a tweet. This was done by padding the sentences with zeros (post-zero-padding)
to ensure all sentences have the same length. This uniformity in sentence length is
beneficial for training machine learning models that require fixed-length input sequences.
By performing this preprocessing step, the text data is ready to be fed into a model
for disaster tweet classification.
3.6 Data Splitting: Training, Validation, and Testing Sets
Data splitting is a crucial step in machine learning, where the available dataset
is divided into separate subsets, such as training, validation, and testing sets,
to facilitate model development, evaluation, and optimization. In the case of 5545
tweets, the data was divided into 15% for testing (831 tweets), 10% for validation
(554 tweets), and 75% for training (4160 tweets).
3.7 Model Selection
Once the data have been prepared for training, deep learning models are trained on
this data for performing disaster tweet classification. Three different models are
trained and compared: GRU, LSTM, and BERT.
3.7.1 GRU (Gated Recurrent Unit)
The GRU introduced by Cho et al. [11] is a type of RNN that has gained popularity in deep learning. It is designed to address
the vanishing gradient problem in traditional RNNs. The GRU includes two key Gates:
update and reset gates. The update gate determines how much of the previous hidden
state should be passed on to the current time step, while the reset gate controls
how much of the previous hidden state should be ignored. These gates play a crucial
role in governing the flow of information in the GRU, allowing it to capture long-term
dependencies in sequential data.
Eq. (1) depicts the functioning of the Update gate, a key component in the Gated Recurrent
Unit (GRU) architecture.
where Z$_{\mathrm{t}}$ represents the update gate activation at time step t. ${\sigma}$
denotes the sigmoid activation function. W$_{\mathrm{z}}$ and U$_{\mathrm{z}}$ are
the weight matrices that control the influence of the current input x$_{\mathrm{t}}$
and the previously hidden state h$_{\mathrm{t-1}}$, respectively.
Eq. (2) captures the functionality of the Reset gate.
Similarly, r$_{\mathrm{t}}$ is the reset gate activation at time step t. W$_{\mathrm{r}}$
and U$_{\mathrm{r}}$ are the weight matrices determining the impact of the current
input x$_{\mathrm{t}}$ and the previously hidden state h$_{\mathrm{t-1}}$ on the reset
gate activation.
These equations and subsequent calculations help the GRU model decide how much information
to retain from the previous time step and how much to update with new inputs, enabling
it to capture and process sequential dependencies effectively.
The GRU-based model initiates with an embedding layer that converts integer-encoded
words into dense vectors using the given vocabulary size and embedding dimensions.
This is succeeded by a Bidirectional GRU layer with 256 neurons, using a ReLU activation
for adept bidirectional sequence processing. A Global Average Pooling1D layer then
summarizes this temporal information. A dense layer with 64 neurons and ReLU activation
is then used, followed by a 0.4 rate dropout layer to mitigate overfitting. The architecture
culminates in a Dense layer with four neurons and a softmax activation, targeting
the classification of distinct disaster classes in tweets.
3.7.2 LSTM (Long Short-term Memory)
LSTM is a well-established RNN architecture that effectively addresses the vanishing
gradient problem, a common issue in training traditional RNNs. The model achieves
this by introducing memory cells and three essential gating mechanisms: the input
gate, forget gate, and output gate. These gates play a critical role in regulating
the flow of information through the network, enabling the LSTM to capture and retain
long-range dependencies in the input sequence. LSTM has widespread applications in
various tasks, including speech recognition, language modeling, and text classification,
owing to its robustness in modeling sequential data. Its exceptional ability to capture
long-term dependencies makes it particularly well-suited for understanding the context
and semantics of the text, which is essential for accurate classification, such as
in disaster-related tweets.
Input Gate: Eq. (3) represents the functioning of the input gate (i$_{\mathrm{t}}$). The input gate controls
how much the current input (x$_{\mathrm{t}}$) should be used to update the cell state
(C$_{\mathrm{t}}$). It is calculated using the sigmoid activation function.
where W$_{\mathrm{i}}$ is the weight matrix for the input gate and [h$_{\mathrm{t-1}}$,
x$_{\mathrm{t}}$] represents the concatenation of the previous hidden state and the
current input. b$_{\mathrm{i}}$ is the bias vector for the input gate. Sigmoid is
the activation function, which scales the output between 0 and 1.
Forget Gate (f$_{\mathrm{t}}$): The forget gate determines the extent to which the
previous cell state (C$_{\mathrm{t-1}}$) should be forgotten when processing the current
input (x$_{\mathrm{t}}$) and the previous hidden state (h$_{\mathrm{t-1}}$). The gate
is also calculated using the sigmoid activation function. The forget gate mathematical
functioning can be explained using Eq. (4).
where W$_{\mathrm{f}}$ is the weight matrix for the forget gate. [h$_{\mathrm{t-1}}$,
x$_{\mathrm{t}}$] represents the concatenation of the previous hidden state and the
current input. b$_{\mathrm{f}}$ is the bias vector for the forget gate. "sigmoid"
is the sigmoid activation function.
Output Gate (O$_{\mathrm{t}}$): The output gate controls the extent to which the current
cell state (C$_{\mathrm{t}}$) should influence the computation of the current hidden
state (h$_{\mathrm{t}}$). The gate is calculated using the sigmoid activation function.
Eq. (5) represents the mathematical equation of the output gate.
where Wo is the weight matrix for the output gate. [h$_{\mathrm{t-1}}$, x$_{\mathrm{t}}$]
represents the concatenation of the previous hidden state and the current input. b$_{\mathrm{o}}$
is the bias vector for the output gate.
These gating mechanisms in LSTM and the memory cell enable the network to update and
forget information selectively, allowing it to learn long-term dependencies and effectively
model sequential data. The ability to capture complex patterns and context in the
input sequence makes LSTM a powerful tool for various natural language processing
tasks, including disaster-related tweet classification, where an accurate understanding
of the text's semantics is crucial.
The architecture of an LSTM-based model begins with an embedding layer, which transforms
integer-encoded words into dense vectors. This feeds into a Bidirectional LSTM with
256 neurons, enhanced by a ReLU activation for efficient bidirectional sequence processing.
A Global Average Pooling1D layer then distills this temporal data, leading to a Dense
layer with 64 neurons and ReLU activation for intricate pattern recognition. A dropout
layer with a 0.4 rate was employed to prevent overfitting. Concluding the architecture,
a softmax-activated dense layer outputs class probabilities, making this design particularly
adept at classifying disaster-related tweets.
3.7.3 BERT (Bidirectional Encoder Representation for Transformers)
BERT is a transformer-based model that has revolutionized natural language processing
tasks. The BERT model follows a two-step framework: pre-training and fine-tuning [12]. In the pretraining phase, the model undergoes training on a vast unlabeled corpus.
For the fine-tuning stage, the model starts with pre-trained parameters, which are
then fine-tuned using labeled data specific to the tasks.
As a transformer-based model, BERT has revolutionized natural language processing
tasks with its bidirectional capabilities. Unlike LSTM and GRU, which are unidirectional
models processing input from left to right, BERT considers both the left and right
contexts of each word in a sentence, providing a more comprehensive understanding
of the context. This bidirectional nature allows BERT to capture long-range dependencies
efficiently. In addition, BERT differs from LSTM and GRU regarding the training objectives.
BERT uses unsupervised pretraining, learning from large amounts of unannotated text
data through masked language modeling and next-sentence prediction. In contrast, LSTM
and GRU typically undergo supervised training with labeled data.
BERT uses a multi-layer bidirectional transformer encoder [12], which consists of N = 6 identical layers, each with two sub-layers. In the initial
sub-layer, a multi-head self-attention mechanism captures the relationships between
different words in the input sequence, allowing the model to comprehend the context
effectively. The subsequent sub-layer uses a position-wise fully connected feedforward
network to process the output of the self-attention layer further. The scaled dot-product
attention mechanism, represented as Eq. (6), is a fundamental building block within the self-attention layer.
where Q, K, and V represent the queries, keys, and values, respectively. This mechanism
calculates attention scores by measuring the relevance of the queries to the keys.
The softmax function normalizes these scores, determining the importance of each value
(V) with respect to the given queries and keys. By focusing on the most relevant parts
of the input sequence, this mechanism captures the contextual dependencies, leading
to meaningful contextualized word representations. This mechanism is a critical component
in the Transformer architecture, contributing to the success of models, such as BERT,
in various natural language processing tasks.
The "Bert-base-uncased" variant of BERT, which comprises 12 layers, was used as the
foundation for classifying disaster-related tweets via transfer learning. This base
model is augmented using a fully connected output layer with neurons with a softmax
activation. Fine-tuning is facilitated using the Adam optimizer with a learning rate
of 1 ${\times}$ 10$^{-5}$ and a decay of 1${\times}$ 10$^{-7}$, optimizing the training
efficacy. Categorical Crossentropy was selected as the loss function, given its suitability
in measuring the discrepancies between predicted and actual class probabilities, making
it particularly suitable for multi-class classification.
3.8 Evaluation Metrics
The evaluation metrics are essential for assessing the performance of machine learning
models. In this study, the performance of the GRU, LSTM, and BERT machine learning
models can be assessed using three pivotal metrics: accuracy, precision, and recall.
These metrics provide a comprehensive insight into the ability of the model to make
correct predictions, its proportion of true positive predictions, and its sensitivity
to positive instances.
Eq. (7) measures the overall correctness of the model predictions. This is the proportion
of the total number of correct predictions. Mathematically, accuracy can be expressed
as follows:
Eq. (8), also known as the positive predictive value, quantifies the proportion of positive
class predictions that are correct. The value measures the model reproducibility or
the closeness of the measurements to each other.
Eq. (9) is Recall, also known as sensitivity, hit rate, or true positive rate, quantifies
the proportion of actual positive class observations that were correctly classified.
This value is a measure of the completeness or the quantity it can correctly identify.
where T$_{\mathrm{P}}$, T$_{\mathrm{N}}$, F$_{\mathrm{P}}$, and F$_{\mathrm{N}}$ are
the true positives, true negatives, false positives, and false negatives, respectively.
These evaluation metrics are crucial for understanding the strengths and weaknesses
of each model in different aspects of performance. This study aimed to determine if
the model exhibits the optimal performance for a specific task by comparing these
metrics across the GRU, LSTM, and BERT models.
4. Results and Discussions
This section discusses the results of the training of disaster tweet classification
models. The training was conducted in Google Colab, which offered GPU acceleration.
In particular, the GPU used for training was GPU 0: Tesla T4 with a memory capacity
of 15360MiB, which is equivalent to 16 GB. This GPU acceleration, along with its high
memory capacity, provided significant computational power and helped improve the efficiency
of the training process.
4.1 Comparison of GRU, LSTM, and BERT Models
Table 1 compares the results of the three models based on accuracy, precision, and recall
of test data. According to the results presented in Table 1, BERT achieved the highest testing accuracy (0.962), followed by LSTM with a testing
accuracy of 0.932 and GRU with a testing accuracy of 0.8847. BERT also had the highest
testing precision (0.963), indicating that BERT, a powerful language model, achieves
impressive results in various natural language processing tasks, including text classification.
In the specific disaster classification task, BERT showed its effectiveness in accurately
predicting the class of a given text. Fig. 3 presents the confusion matrix for the performance of BERT on the disaster classification
task, using the classes disaster classes 'Accident', 'Earthquake', 'Flood', and 'Other
disaster'. The confusion matrix provides valuable insights into the performance of
the model by showing the number of correct and incorrect predictions for each class.
In this case, the rows represent the true classes, while the columns represent the
predicted classes.
From the confusion matrix, BERT has achieved high accuracy in predicting the 'Accident',
'Earthquake', and 'Other disaster' classes, complexity, with most predictions falling
into these categories being correct.
On the other hand, there are few instances where 'Accident' correctly identifies positive
instances out of all instances predicted as positive. LSTM also performed well in
this aspect, with a precision of 0.952. In contrast, GRU had a slightly lower precision
of 0.8923. Regarding testing recall, BERT achieved the highest score of 0.9625, followed
by LSTM with a recall of 0.917. GRU had a slightly lower recall of 0.8811. BERT demonstrated
the best performance across all metrics, achieving high accuracy, precision, and recall.
LSTM also performed well, and GRU showed slightly lower accuracy, precision, and recall
performance.
Table 1. Comparison of three models based on accuracy, recall, and precision.
Fig. 3. Confusion matrix for BERT.
4.2 Performance Analysis of the BERT Model
'Earthquake' classes were misclassified as 'Flood' or 'Other disaster'.
Similarly, the 'Flood' class had some misclassifications, with a few instances being
predicted as 'Accident' or 'Other disaster'. The 'Other disaster' class also had a
few misclassifications, with some instances being predicted as 'Accident', 'Earthquake',
or 'Flood'.
Overall, BERT demonstrated its effectiveness in disaster classification, achieving
high accuracy in predicting the majority of instances correctly. Nevertheless, there
is still room for improvement, particularly in reducing misclassifications between
certain classes.
The history plot of the BERT model validation and training accuracy over 20 epochs
reveals interesting trends, as shown in Fig. 4. Initially, the validation accuracy started at 94 % but experienced a significant
jump to 96 % at the 3$^{\mathrm{rd}}$ epoch. Subsequently, the validation accuracy
remains constant throughout the remaining epochs. On the other hand, the training
accuracy started at 88 % and increased steadily to 98 % by the 3$^{\mathrm{rd}}$ epoch.
Subsequently, the training accuracy continued to increase slightly, reaching 99.2
%. This suggests that the BERT model performs well in terms of training and validation
accuracy, with the validation accuracy showing stability after an initial improvement.
The consistent increase in training accuracy indicates that the model is effective
in learning and improving its performance over time.
An eight-epoch comparison of three model variants was performed using BERT for disaster
tweet classification. The objective was to analyze the influence of the number of
hidden layers on the model performance. The original model, featuring one hidden layer,
exhibited impressive progress during training, consistently enhancing accuracy, precision,
recall, and F1-score on the validation dataset. This highlighted the proficiency of
the model in capturing the essential patterns from the text data.
In Variant 1, designed with two hidden layers, the performance of the model was competitive.
Despite an initial metric lag compared to the original model, rapid convergence led
to commendable evaluation scores. This suggests that the additional hidden layers
facilitated nuanced pattern recognition, contributing to either equivalent or improved
outcomes.
Variant 2, leveraging four hidden layers, demonstrated swift pattern discernment and
efficient convergence. Despite its increased complexity, this architecture achieved
notable precision, recall, and F1-score values, underscoring its capability to learn
intricate text features.
These observations underscore the interplay between the number of hidden layers and
model performance. Both simpler and deeper architectures yielded promising results,
potentially due to the enhanced feature extraction capabilities. On the other hand,
careful consideration of overfitting risks is essential when adjusting the model.
In summary, this analysis, conducted using BERT for disaster tweet classification,
sheds light on the impact of hidden layer variations, offering valuable insights for
architectural decisions in natural language processing tasks.
Fig. 4. Plot illustrating the validation and training accuracy of the BERT model over 20 epochs.
Conclusion and Future Work
This study analyzed the efficacy of BERT, GRU, and LSTM deep learning models in classifying
disaster-related tweets. The results showcased the superior performance of BERT in
precision, recall, and accuracy. This highlights the potential BERT for improved disaster
management by analyzing tweets, identifying the disaster type, and formulating appropriate
response strategies. The study also highlighted the importance of location information
in disaster management and the varied word usage based on the type of disaster.
The study provides promising insights. Therefore, future research should extend to
different disaster types, such as wildfires or pandemics, to explore the adaptability
of these models. In addition, how these models can integrate with current disaster
management systems for improved efficiency will also be a subject for future research.
Furthermore, as these models advance, ethical considerations of content filtration
and information prioritization should be evaluated to ensure responsible and transparent
utilization that does not infringe on ethical norms or human rights.
ACKNOWLEDGMENTS
This work was supported in part by the National Research Foundation of Korea (NRF)
grant funded by the Korean government (MSIT) (No.2022R1A2C1003549) and in part by
the 2023 Hongik University Innovation Support Program Fund.
REFERENCES
Zou, Lei, Danqing Liao, Nina SN Lam, Michelle A. Meyer, Nasir G. Gharaibeh, Heng Cai,
Bing Zhou, and Dongying Li. "Social media for emergency rescue: An analysis of rescue
requests on Twitter during Hurricane Harvey." International Journal of Disaster Risk
Reduction 85 (2023): 103513.
Karimiziarani, Mohammadsepehr, Keighobad Jafarzadegan, Peyman Abbaszadeh, Wanyun Shao,
and Hamid Moradkhani. "Hazard risk awareness and disaster management: Extracting the
information content of twitter data." Sustainable Cities and Society 77 (2022): 103577.
Samuel, Jim, G. G. Md. Nawaz Ali, Md. Mokhlesur Rahman, Ek Esawi, and Yana Samuel.
2020. "COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification"
Information 11, no. 6: 314.
Piyush Jain, Sean C.P. Coogan, Sriram Ganapathi Subramanian, Mark Crowley, Steve Taylor,
and Mike D. Flannigan. 2020. A review of machine learning applications in wildfire
science and management. Environmental Reviews. 28(4): 478-505.
Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M. (2016). The paradigm-shift
of social spambots: Evidence, theories, and tools for the arms race. IEEE Communications
Magazine, 54(3), 100-107.
Imran, M., Elbassuoni, S. M., Castillo, C., Diaz, F., Meier, P. (2016). Practical
extraction of disaster-relevant information from social media. Proceedings of the
39th International ACM SIGIR Conference on Research and Development in Information
Retrieval, 1023-1026.
R. Ni and H. Cao, "Sentiment Analysis based on GloVe and LSTM-GRU," 2020 39th Chinese
Control Conference (CCC), Shenyang, China, 2020, pp. 7492-7497.
Ashktorab, Zahra, Christopher Brown, Manojit Nandi, and Aron Culotta. "Tweedr: Mining
twitter to inform disaster response." In ISCRAM, pp. 269-272. 2014.
Nguyen, Dong, Rilana Gravel, Dolf Trieschnigg, and Theo Meder. "" How old do you think
I am?" A study of language and age in Twitter." In Proceedings of the International
AAAI Conference on Web and Social Media, vol. 7, no. 1, pp. 439-448. 2013.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "Bert: Pretraining
of deep bidirectional transformers for language understanding" (2018).
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H.,&
Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical
machine translation.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining
of deep bidirectional transformers for language understanding, 2018.
Author
Ihsan Ullah received his B.S. in Computer Systems Engineering from the University
of Engineering and Technology Peshawar, Pakistan, and his M.S. in Computer and Wireless
Networks from COMSATS University, Islamabad, in 2021. He was a research assistant
in the Wireless and Communication lab for half a year. He is pursuing his Ph.D. in
Software and Communication Engineering at Hongik University, South Korea, under Prof.
Byung-Seo Kim. His research interests encompass NDN, Underwater Wireless Sensor Networks,
Cloud and Fog Computing, Vehicular Networks, and aspects of Machine Learning and Artificial
Intelligence.
Anum Jamil is a final-year B.S. student in Applied Physics at NED University of
Engineering and Technology, Karachi. She is currently interning at the university's
Smart City Lab. She is also engaged with the distinguished President's Initiative
of Artificial Intelligence, demonstrating her dedication to machine learning and AI.
Her research interests lie in Natural Language Processing (NLP), the Internet of Things
(IoT), and Artificial Intelligence.
Imtiaz ul Hassan holds a B.S. degree in Computer Systems Engineering from the
University of Engineering and Technology Peshawar, Pakistan. Currently, he is pursuing
his M.S. in Data Science from NEDUET Karachi. In addition to his studies, Imtiaz is
actively engaged as a research associate in the Smart City LAB at the National Center
for Artificial Intelligence. His research interests primarily involve computer vision,
natural language processing (NLP), autonomous vehicles, and robotics.
Byung-Seo Kim received his B.S. degree in Electrical Engineering from In-Ha University,
In-Chon, Korea in 1998 and his M.S. and Ph.D. degrees in Electrical and Computer Engi-neering
from the University of Florida in 2001 and 2004, respectively. His Ph.D. study was
supervised by Dr. Yuguang Fang. Between 1997 and 1999, he worked for Motorola Korea
Ltd., PaJu, Korea, as a CIM Engineer in ATR&D. From January 2005 to August 2007, he
worked for Motorola Inc., Schaumburg, Illinois, as a Senior Software Engineer in Networks
and Enterprises for designing the protocol and network architecture of wireless broadband
mission-critical communications. He is a professor in the Department of Software and
Communications Engineering at Hongik University, Korea. He is an IEEE Senior Member
and is an Associative Editor of IEEE Access, Telecommunication Systems, and Journal
of the Institute of Electrics and Information Engineers. His studies have appeared
in approximately 260 publications and 32 patents. His research interests include designing
and developing efficient wireless/wired networks, including link-adaptable/cross-layer-based
protocols, multi-protocol structures, wireless CCNs/NDNs, Mobile Edge Computing, physical
layer design for broadband PLC, and resource allocation algorithms for wireless networks.