IEIE - Journal of the Institute of Electronics and Information Engineers

Mobile QR Code

Main Menu

Journal Search


Title	Synthetic Speech Classification based on Cascade Connection of CNN and MKDE Models
Authors	심영준(Youngjun Sim) ; 최준규(Jungyu Choi) ; 임성빈(Sungbin Im)
DOI	https://doi.org/10.5573/ieie.2023.60.2.94
Page	pp.94-101
ISSN	2287-5026
Keywords	Speech synthesis; Classification; Melspectrogram; CNN; MKDE
Abstract	Speech synthesis algorithms developed over the past few years can be easily used by the general public and have excellent performance. If these technologies are used maliciously, they can be used for various crimes such as people impersonation and fake news. Therefore, many studies have been conducted recently to solve these problems, and various synthetic speech detectors have been developed in many previous studies. In this paper, we propose a synthetic speech classification model that enables the classification of which algorithms are used to synthesize speech in situations where synthetic speech has effects such as noise, reverberation, and compression. The proposed model consists of CNN (Convolutional Neural Network) and MKDE (Multivariate Kernel Density Estimation). In the CNN model, the melspectrogram of the audio signal was used as a feature, and in the MKDE model, the PDF (Probability Density Function) of the training class was estimated using the logit value of the CNN as a feature. The data used for model training and evaluation were provided at the 2022 IEEE Signal Processing Cup. The final model performance shows 96.5% and 95.5% accuracy in noise-free, and noisy, respectively.

Copyright © IEIE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.