| Title |
Automated Label Structure Validation for Multi-source Iris Detection Datasets |
| Authors |
김승진(Seung-Jin Kim) ; 김명주(Myuhng-Joo Kim) |
| DOI |
https://doi.org/10.5573/ieie.2025.62.12.106 |
| Keywords |
Iris detection; Dataset validation; Label consistency; Multi-source integration; Bounding-box analysis |
| Abstract |
Hidden label inconsistencies arising from multi-source object detection dataset integration pose a critical challenge to data reusability[1]. We propose an automated label structure validation framework for merged iris detection datasets comprising 12,027 images with 75,904 annotated objects from two heterogeneous sources on Roboflow Universe. The proposed framework employs a four-stage pipeline consisting of file validation, format verification, geometric clustering, and statistical testing, utilizing only bounding-box geometric features (normalized area and aspect ratio) rather than label text. K-means clustering and silhouette analysis of 84 nominal class labels revealed two distinct semantic groups: pupil-like objects (n=45,725, mean area=0.0270±0.0189) and iris-like objects (n=15,575, mean area=0.0792±0.0467). Two-sample t-test confirmed statistically significant differences between groups (t=196.44, p<0.001) with large effect size (Cohen's d=1.82)[2]. Quality assessment achieved zero missing files, zero format errors, and 100% coordinate validity, with cross-split class distribution variation below 1% (maximum 0.8%). Comparative evaluation demonstrated that the proposed method outperformed manual sampling by 4.4×, statistical outlier detection by 1.6×, and Confident Learning by 1.2× in F1 score, achieving F1=0.92. The framework is domain-independent and fully automated, enabling extension to diverse object detection applications. |