Title |
COVID-19 Multilingual News Article Auto-indexing and Classification using ChatGPT and Multilingual BERT |
Authors |
강승태(Seungtae Kang) ; 장길진(Gil-Jin Jang) |
DOI |
https://doi.org/10.5573/ieie.2023.60.7.20 |
Keywords |
COVID-19; BERT; ChatGPT; Multilingual; News classification |
Abstract |
In this paper, we propose a method of automatically classifying multilingual new articles into predefined relevant events to help prevent international spread of the COVID-19. Conventional automatic classification methods require large amounts of learning data and human labeling, which are not suitable for coping with rapidly mutating and spreading COVID-19 and similar infectious diseases. We proposes a method for constructing large-scale training dataset by automatic article classification by ChatGPT and multilingual data augmentation by Google Translator, which are both paid services. Using the constructed multilingual training dataset, we proposed a model that automatically classifies news articles into the predefined categories using multilingual BERT. The proposed method enables early prediction of various types of COVID-19 events with decent accuracy. According to the experiments, with 5,898 news articles as training data the multilingual BERT without the proposed automatic indexing and augmentation, the average accuracy and F1 score were 85.85% and 67.57%, which were insufficient for practical applications. However, with 47,183 news articles using the proposed method, the average accuracy and F1 score improved to 98.21% and 95.71%. |