IEIE - Journal of the Institute of Electronics and Information Engineers

Mobile QR Code

Main Menu

Journal Search


Title	Automated Label Structure Validation for Multi-source Iris Detection Datasets
Authors	김승진(Seung-Jin Kim) ; 김명주(Myuhng-Joo Kim)
DOI	https://doi.org/10.5573/ieie.2025.62.12.106
Page	pp.106-118
ISSN	2287-5026
Keywords	Iris detection; Dataset validation; Label consistency; Multi-source integration; Bounding-box analysis
Abstract	Hidden label inconsistencies arising from multi-source object detection dataset integration pose a critical challenge to data reusability[1]. We propose an automated label structure validation framework for merged iris detection datasets comprising 12,027 images with 75,904 annotated objects from two heterogeneous sources on Roboflow Universe. The proposed framework employs a four-stage pipeline consisting of file validation, format verification, geometric clustering, and statistical testing, utilizing only bounding-box geometric features (normalized area and aspect ratio) rather than label text. K-means clustering and silhouette analysis of 84 nominal class labels revealed two distinct semantic groups: pupil-like objects (n=45,725, mean area=0.0270±0.0189) and iris-like objects (n=15,575, mean area=0.0792±0.0467). Two-sample t-test confirmed statistically significant differences between groups (t=196.44, p<0.001) with large effect size (Cohen's d=1.82)[2]. Quality assessment achieved zero missing files, zero format errors, and 100% coordinate validity, with cross-split class distribution variation below 1% (maximum 0.8%). Comparative evaluation demonstrated that the proposed method outperformed manual sampling by 4.4×, statistical outlier detection by 1.6×, and Confident Learning by 1.2× in F1 score, achieving F1=0.92. The framework is domain-independent and fully automated, enabling extension to diverse object detection applications.

Copyright © IEIE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.