Title |
BERT-based Transfer Learning Research for Financial Dataset Implementation |
DOI |
https://doi.org/10.5573/ieie.2023.60.9.75 |
Keywords |
Natural language processing; Named entity recognition; BERT; Transfer learning; Financial dataset |
Abstract |
In the field of Natural Language Processing (NLP) for Korean, research has been actively conducted centering on the BERT language model introduced by Google. However, its application to Korean still has some limitations due to the nature of this language. Named Entity Recognition (NER) is one of the NLP tasks that detects entity names represented in large amounts of unstructured text and classifies them according to predefined entity classes. As one of the lexical features, entity information provides a clue to understanding of domain-specific knowledge within a text. Extracting the entity information in text typically requires preprocessing, including tokenization and part-of-speech tagging. In this study, we present an English corpus specialized in finance for Korean language processing. We extracted and integrated about 8,000 financial texts from the Financial_phrasebank dataset and financial reports of about 83 global companies that are members of IIRC to build a dataset consisting of a total of about 12.7 thousand sentences. We propose a language model that extends the classification of object names from 7 classes to 15 classes in the process of transfer learning. After labeling and training the model by combining the BERT_base model, we propose an optimal dataset for the financial field through accuracy, recall, and F1 score. |