Mobile QR Code
Title A Study on Improving Minority Class Classification using CVAE-based Synthetic Data
Authors 김영태(YoungTae Kim) ; 서병석(ByungSuk Seo)
DOI https://doi.org/10.5573/ieie.2025.62.11.131
Page pp.131-137
ISSN 2287-5026
Keywords Synthetic data; Conditional VAE; Data imbalance; Obstructive sleep apnea; Public health dataset
Abstract Medical datasets, particularly in disease prediction tasks, often suffer from severe class imbalance, which significantly hinders model performance on minority classes. This study proposes a data augmentation method based on Conditional Variational Autoencoder (CVAE) to mitigate this problem. We used the Korea National Health and Nutrition Examination Survey (KNHANES IX, 2022?2023) to predict the presence of obstructive sleep apnea (OSA), where the minority-to-majority class ratio exceeds 1:100. Synthetic samples for the minority class were generated using CVAE, while majority samples were undersampled before training an XGBoost classifier. The results show consistent improvements in F1-score for the minority class as well as F1-Macro and F1-Weighted metrics. These findings suggest that CVAE-based augmentation is an effective strategy for improving the reliability and practicality of predictive models for rare diseases.