Title |
Deep Learning Model based on Natural Language Processes for Multi-class Classification of R&D Documents: Focused on Climate Technology Classification |
Authors |
주경원(Kyungwon Joo) ; 이관수(Kwansu Lee) ; 이성만(Seung-Man Lee) ; 최안준(Anjoon Choi) ; 노건태(Geontae Noh) ; 천지영(Ji Young Chun) |
DOI |
https://doi.org/10.5573/ieie.2022.59.7.21 |
Keywords |
Multi-class classification; Natural language; Deep learning; Climate technology |
Abstract |
According to the Paris Climate Agreement, carbon neutrality is declared worldwide, and interest is growing in what climate technologies the national R&D projects are investing in. In this study, a deep learning model was developed that automatically classifies into 45 climate technology classification systems using literature information on national R&D projects. Of the 291,381 R&D projects registered in the NTIS from 2016 to 2020, 217,880 projects were assigned to training datasets from 2016 to 2019, and 73,501 projects from 2020 were assigned to test datasets. For morpheme analysis, kiwi and Mecab were used, and the structure of the deep learning model was developed by using 1D-CNN for FC and EC models, and ELECTRA for KoE model. Considering that the dataset is extremely unbalanced for each class, the F1 score was employed in this study as a performance metric, and the performance of each model and ensemble model was examined. In the individual models, the FC model, which focuses on keyword frequency, shows the best F1 score of 0.824, and in the ensemble model, the Ens4 model, which soft-voted all of the individual models, showed the highest performance with F1 score of 0.833. With this model in automatic classification tasks for documents containing more technical terminologies than a general corpus, a more efficient process will be provided than the direct labeling by technical experts. |