Mobile QR Code
Title Performance Comparison of Deep Learning Individual Model Classifier and Ensemble Classifier for Automatic Classification of Patent Documents
Authors 김성훈(Sunghoon Kim) ; 김승천(Seungcheon Kim)
DOI https://doi.org/10.5573/ieie.2021.58.9.34
Page pp.34-41
ISSN 2287-5026
Keywords Patent classification; Ensemble; Deep learning
Abstract Due to the rapid increase in technological innovation and the corresponding increase in applications, the automatic patent document classifier is very useful for both individual inventors and patent attorneys when classifying patents .In this study, a model for classifying patent documents was selected from a viewpoint similar to that of a patent expert.MLP, which is a model that can indicate the existence and frequency of existence of major keywords, CNN, which is a model that can indicate the existence and frequency of existence of keywords in close proximity, and LSTM, which is a model that can indicate the structure of sentences and order of keyword , Attention, and Transformer were selected to select a model that can be classified as similar to that of patent experts as much as possible. In this study, three ensemble methods are used. The ensemble method used bagging [8], which independently processes the results of each classifier, and measured each performance using three methods: voting, summation, and weighting. Among the three datasets used in this study, two datasets (Dataset #2, Dataset #3) had high ensemble accuracy, and one dataset (Dataset #1) had high accuracy of the attention model, an individual model. was high. This result can be judged to be due to the classification viewpoint (classification by keyword, classification by sentence structure) of each dataset If an ensemble model using an existing model is used instead of the generation of a model according to the user patent classification, it is expected that the optimal accuracy for each data set or accuracy close to that of a single model with optimal accuracy can be achieved. It would be suitable for commercializing a document classifier.