KIEE - The Transactions P of the Korean Institute of Electrical Engineers

Mobile QR Code QR CODE : The Transactions P of the Korean Institute of Electrical Engineers

QR CODE : The Transactions P of the Korean Institute of Electrical Engineers

The Transactions P of the Korean Institute of Electrical Engineers

Korean Journal of Air-Conditioning and Refrigeration Engineering

ISO Journal TitleTrans. P of KIEE

Indexed by
Korea Citation Index(KCI)

Main Menu

Journal Search

XML PDF INFO REF


Title	Speech Emotion Recognition Based on Deep learning using Multi-Feature Level Fusion of Canonical Correlation Analysis Method
Authors	조이현(A-Hyeon Jo) ; 곽근창(Keun-Chang Kwak)
DOI	https://doi.org/10.5370/KIEEP.2023.72.3.214
Page	pp.214-222
ISSN	1229-800X
Keywords	Speech emotion recognition; human-computer interactions; deep learning; canonical correlation analysis; feature level fusion
Abstract	Speech emotion recognition is a technology that identifies emotional states in human speech and plays a crucial role in enhancing Human Computer Interactions (HCI) more naturally and effectively. This technology supports the accurate understanding and appropriate response to human emotions through AI. In this study, we compare and analyze the performance of a speech emotion recognition model based on deep learning that utilizes the fusion of multiple features from speech signals. Various features such as bark-spectrum, mel-spectrum, Mel Frequency Cepstrum Coefficient (MFCC), and GammaTone Cepstrum Coefficient (GTCC) are extracted from the speech signal. Among these, two types of features are fused based on Canonical Correlation Analysis (CCA) methods to obtain a new single feature vector, which is then used as the input for the one-dimensional Convolutional Neural Network (1D-CNN) emotion recognition model. This fused feature contributes to enhancing the efficiency and accuracy of emotion recognition, and its performance is analyzed in comparison to the case of using only each feature. The performance of the model is evaluated using the AI-hub emotion classification dataset and the Korean speech emotion state classification dataset constructed by Chosun University. For both datasets, the multiple features level fusion through the CCA method improved the performance of voice emotion recognition compared to using single features.

Copyright © KIEE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.