Title |
A Study on the Development of Deep Learning-Based Deep Voice Detection System Using Mel-Spectrogram and MFCC |
Authors |
한승우(Seung-Woo Han) ; 한성훈(Seong-Hun Han) ; 유성민(Seong-Min You) ; 송동호(Dong-Ho Song) ; 서창진(Chang-Jin Seo) |
DOI |
https://doi.org/10.5370/KIEEP.2023.72.3.186 |
Keywords |
Deep Voice; Mel-Spectrogram; Bi-LSTM; CNN; MFCC; Voice Synthesis |
Abstract |
Deep voice refers to a fake voice produced with deep learning and voice synthesis technology. In this paper, we propose a deep-learning-based deep voice detection system using MFCC and Mel-Spectrogram. We propose an ensemble model using CNN (Convolution Neural Network) and BiLSTM for the development of deep voice detection systems. In the experiment, the training dataset used voice data provided by AI-HUB about 50,000 voice data, 25,000 each for the deep voice and general voice. And the test dataset was created with 370 deep voices generated from NAVER CLOVA and 329 directly recorded datasets. And a 92.27% accuracy model was constructed using the soft-voating method. The deep voice detection system detects deep voices based on the ensemble model and provides results when the user records the voice and transmits it to the server. The deep voice detection system proposed in this paper is expected to improve stability and reliability in areas where deep voice-based crime. |