| Title |
A Study on Improving Minority Class Classification using CVAE-based Synthetic Data |
| Authors |
김영태(YoungTae Kim) ; 서병석(ByungSuk Seo) |
| DOI |
https://doi.org/10.5573/ieie.2025.62.11.131 |
| Keywords |
Synthetic data; Conditional VAE; Data imbalance; Obstructive sleep apnea; Public health dataset |
| Abstract |
Medical datasets, particularly in disease prediction tasks, often suffer from severe class imbalance, which significantly hinders model performance on minority classes. This study proposes a data augmentation method based on Conditional Variational Autoencoder (CVAE) to mitigate this problem. We used the Korea National Health and Nutrition Examination Survey (KNHANES IX, 2022?2023) to predict the presence of obstructive sleep apnea (OSA), where the minority-to-majority class ratio exceeds 1:100. Synthetic samples for the minority class were generated using CVAE, while majority samples were undersampled before training an XGBoost classifier. The results show consistent improvements in F1-score for the minority class as well as F1-Macro and F1-Weighted metrics. These findings suggest that CVAE-based augmentation is an effective strategy for improving the reliability and practicality of predictive models for rare diseases. |