Title |
Prediction of Infectious Disease Risk Index using Discriminative Keyword Extraction from Multi-lingual News Articles |
Authors |
김기후(Gi-Hu Kim) ; 장종원(Jongwon Jang) ; 장길진(Gil-Jin Jang) |
DOI |
https://doi.org/10.5573/ieie.2025.62.5.35 |
Keywords |
COVID-19; Infectious disease; Term frequency (TF); Keyword extraction; Risky index |
Abstract |
In this paper, we propose a COVID-19 risk level prediction system using Discriminative Term Frequency-Inverse Document Frequency (D-TF-IDF) keyword extraction from multilingual news articles, in addition to the existing natural language-based methods for predicting the increase in COVID-19 confirmed cases. D-TF-IDF extracts keywords by excluding overlaps among key terms appearing in multilingual news, which are then used as inputs for models predicting the increase in COVID-19 confirmed cases and risk levels. The proposed keyword extraction method improves the classification performance metric, micro F1-score, in the COVID-19 confirmed case increase and risk level prediction system. Random Over Sampling, a data augmentation technique, is used to address the imbalance issue in keyword data representing risk levels. Additionally, to enhance classification performance, the system employs the classifier that shows the best performance among Random Forest, Balanced Random Forest, Support Vector Machine, and Gaussian Process Classifiers for predicting COVID-19 risk levels. |