Title |
Programming Language Identification via Selection of Target Categories and Machine Learning for Unknown Category |
Authors |
유희정(Heejeong Yoo) ; 이은수(Eunsu Lee) ; 유훈(Hoon Yoo) |
DOI |
https://doi.org/10.5370/KIEE.2025.74.5.950 |
Keywords |
Programming Language Identification; Source Code Classification; Machine Learning; Unknown Category; Target Category |
Abstract |
This paper presents a method for an efficient programming language identification technique which consists of selection of target categories and machine learning for unknown category. As software development scales are complicated, the source codes in use are getting huge and diverse. A programming language identification technique is essential to manage source codes effectively. However, existing research lacks a clear rationale for target categories selection criteria and does not address the issue of unknown categories. In this paper, we propose a method for selecting target categories based on frequency of use and importance, and machine learning techniques to handle unknown categories. We present three methods for classifying the unknown categories. Experimental results demonstrate that the proposed methods effectively classify the unknown categories. |