IEIE - Journal of the Institute of Electronics and Information Engineers

Mobile QR Code

Main Menu

Journal Search


Title	A Study on Improving Predictive Performance and Reliability in Imbalanced Medical Data via Synthetic Data Augmentation
Authors	김영태(YoungTae Kim) ; 황상원(SangWon Hwang) ; 고상백(SangBaek Koh) ; 서병석(ByungSuk Seo)
Page	pp.151-160
ISSN	2287-5026
Keywords	Synthetic data augmentation; Class imbalance; Medical data analysis; Probability calibration; Machine learning
Abstract	Class imbalance, characterized by the extreme scarcity of positive cases, is a common challenge in medical data analysis. This study investigates the impact of synthetic data augmentation on classification performance and predictive reliability in highly imbalanced clinical datasets. Using the KoGES-ARIRANG cohort, we generated synthetic samples for various disease combinations based on a variational autoencoder (VAE) framework and systematically evaluated performance changes according to different augmentation levels. The results show that synthetic data augmentation consistently improves classification performance metrics, including F2-score, Matthews Correlation Coefficient (MCC), AUROC, and AUPRC. In particular, for rare disease combinations with only 30 positive samples, the F2-score increased by up to +0.189 and MCC by up to +0.221, while AUROC and AUPRC improved by up to +0.093 and +0.095, respectively. In addition, analyses of probability calibration metrics, including the Brier score and calibration measures, demonstrated concurrent improvements in predictive reliability. These findings suggest that synthetic data augmentation serves not only to enhance predictive performance but also to improve model robustness and reliability in highly imbalanced clinical settings.

Copyright © IEIE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.