Mobile QR Code
Title End-to-end based on CRNN-GLU-ATT Model for Robust Emotional Features Extraction
Authors 이상현(Sang-Hyun Lee) ; 김재동(Jae-Dong kim) ; 고한석(Han-Seok Ko)
DOI https://doi.org/10.5573/ieie.2020.57.10.45
Page pp.45-55
ISSN 2287-5026
Keywords End-to-end; Gated linear units; Attention mechanism
Abstract In this paper, we propose an end-to-end method of convolution recurrent neural networks (CRNNs) for effective emotion recognition using gated linear units (GLUs) and attention modules with audio emotion data. In the previous study, the acoustic emotion recognition was used in the model by extracting features considering human voice tone, energy, and intensity. This pre-processing requires a significant amount of manual efforts of domain experts. However, the end-to-end method proposed for speech recognition is used to construct a model suitable for emotion recognition data by training the model without the existing step. Furthermore, we use CRNNs considering local and global features simultaneously which are advantages of CNNs and RNNs. GLUs and attention network are integrated with the CRNNs model to consider important weights of emotional data factors. Additionally, we visually analyze the features of emotions in convolutional layers. Two data sets are used in the experiment. One is conducted using the interactive emotional dyadic motion capture database (IEMOCAP), a benchmark data used in the wild in the recognition task among four emotions (anger, happiness, neutrality, and sadness). Another is used to recognize seven emotions using the German EmoDB. Our proposed method is shown to improve accuracy by 16% over exiting methods in IEMOCAP. Using EmoDB shows a 3% gain from the state of art performance.