Mobile QR Code QR CODE : The Transactions P of the Korean Institute of Electrical Engineers
The Transactions P of the Korean Institute of Electrical Engineers

Korean Journal of Air-Conditioning and Refrigeration Engineering

ISO Journal TitleTrans. P of KIEE
  • Indexed by
    Korea Citation Index(KCI)
Title Image-based Approaches for Identifying Harmful Sites using OCR and Average Hash Methods
Authors 박시현(Si-Hyeon Park) ; 유성민(Seong-Min You) ; 송동호(Dong-Ho Song) ; 이광재(Kwangjae Lee)
DOI https://doi.org/10.5370/KIEEP.2023.72.2.112
Page pp.112-119
ISSN 1229-800X
Keywords Web crawling; OCR; Average Hash; Harmful advertisements identification; Harmful site identification
Abstract Recently, websites containing harmful information such as gambling, illegal drugs, pornography, and prostitution are exposed to the public. These harmful sites cause damage to copyright holders and related service industries, and cause various social problems. In this paper, we propose an image-based harmful site identification system using OCR and Average Hash techniques to identify and classify harmful sites. This system uses the characteristic that most gambling banner advertisements repeatedly use similar images, and analyzes the similarity with the average hash value of the banner advertisement image. And using Easy OCR, it determines whether the phrase written in the banner advertisement is harmful or not. To evaluate the performance of the proposed idea, a program was created to determine harmfulness by collecting and analyzing the site's banner advertisement image when the site name was entered, and it was confirmed that the discrimination accuracy was 84%. In addition, since the information collected while running the program is stored in the database, trends in harmful sites can be identified. This will be effectively used to search for harmful sites that are expected to occur