Mobile QR Code
Title Synthetic Speech Classification based on Cascade Connection of CNN and MKDE Models
Authors 심영준(Youngjun Sim) ; 최준규(Jungyu Choi) ; 임성빈(Sungbin Im)
DOI https://doi.org/10.5573/ieie.2023.60.2.94
Page pp.94-101
ISSN 2287-5026
Keywords Speech synthesis; Classification; Melspectrogram; CNN; MKDE
Abstract Speech synthesis algorithms developed over the past few years can be easily used by the general public and have excellent performance. If these technologies are used maliciously, they can be used for various crimes such as people impersonation and fake news. Therefore, many studies have been conducted recently to solve these problems, and various synthetic speech detectors have been developed in many previous studies. In this paper, we propose a synthetic speech classification model that enables the classification of which algorithms are used to synthesize speech in situations where synthetic speech has effects such as noise, reverberation, and compression. The proposed model consists of CNN (Convolutional Neural Network) and MKDE (Multivariate Kernel Density Estimation). In the CNN model, the melspectrogram of the audio signal was used as a feature, and in the MKDE model, the PDF (Probability Density Function) of the training class was estimated using the logit value of the CNN as a feature. The data used for model training and evaluation were provided at the 2022 IEEE Signal Processing Cup. The final model performance shows 96.5% and 95.5% accuracy in noise-free, and noisy, respectively.