Mobile QR Code
Title Recursive Oversampling Method for Improving Classification Performance of Class Unbalanced Data in Patent Document Automatic Classification
Authors 김성훈(Sunghoon Kim) ; 김승천(Seungcheon Kim)
DOI https://doi.org/10.5573/ieie.2021.58.4.43
Page pp.43-49
ISSN 2287-5026
Keywords 특허분류; 클래스 불균형; 재귀적 오버샘플링; MLP
Abstract Class imbalance refers to a case in which the difference in the number of samples between the defined classes is very large, so that most of the samples are predicted as a majority class with a larger number of samples than a minority class with a small number of samples. In this study, we propose a technique called recursive oversampling to solve the problem of classifiers generated from patent data in class imbalance. After generating a classifier trained with class imbalanced data, random data is generated based on the patent document, and the generated random data is classified with the previously defined classifier. The recursive oversampling is a method of sampling data predicted by a minority class among random data predicted by a classifier. When comparing the classifier made through recursive oversampling with the original classifier, the precision, recall, and f-score of the minority class were increased. In particular, it was confirmed that the accuracy of the minority class increased even when compared to the classifier using the SMOTE oversampling technique.