IEIE - Journal of the Institute of Electronics and Information Engineers

Mobile QR Code

Main Menu

Journal Search


Title	BERT-based Transfer Learning Research for Financial Dataset Implementation
Authors	김학진(Kim Hackjin)
DOI	https://doi.org/10.5573/ieie.2023.60.9.75
Page	pp.75-83
ISSN	2287-5026
Keywords	Natural language processing; Named entity recognition; BERT; Transfer learning; Financial dataset
Abstract	In the field of Natural Language Processing (NLP) for Korean, research has been actively conducted centering on the BERT language model introduced by Google. However, its application to Korean still has some limitations due to the nature of this language. Named Entity Recognition (NER) is one of the NLP tasks that detects entity names represented in large amounts of unstructured text and classifies them according to predefined entity classes. As one of the lexical features, entity information provides a clue to understanding of domain-specific knowledge within a text. Extracting the entity information in text typically requires preprocessing, including tokenization and part-of-speech tagging. In this study, we present an English corpus specialized in finance for Korean language processing. We extracted and integrated about 8,000 financial texts from the Financial_phrasebank dataset and financial reports of about 83 global companies that are members of IIRC to build a dataset consisting of a total of about 12.7 thousand sentences. We propose a language model that extends the classification of object names from 7 classes to 15 classes in the process of transfer learning. After labeling and training the model by combining the BERT_base model, we propose an optimal dataset for the financial field through accuracy, recall, and F1 score.

Copyright © IEIE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.