Mobile QR Code QR CODE : The Transactions P of the Korean Institute of Electrical Engineers

Korean Journal of Air-Conditioning and Refrigeration Engineering

ISO Journal TitleTrans. P of KIEE

Indexed by
Korea Citation Index(KCI)

Main Menu

Journal Search

[

Research article

]

The Transactions P of the Korean Institute of Electrical Engineers

KIEEP Vol. 70, No. 3, p.163-173

ISSN (print) :

1229-800X

ISSN (online) :

2586-7792

Received : 19 August 2021Revised : 25 August 2021Accepted : 11 August 2021

DOI :

http://doi.org/10.5370/KIEEP.2021.70.3.163

Deep Learning Based Emotion Recognition with the Use of Time-Scale Two-Dimensional Transformation from Electrocardiogram Signals

심전도신호로부터 시간-스케일 2차원 변환을 가진 딥러닝기반 감정인식

이명원 (Myung-Won Lee) ¹iD 변영현 (Yeong-Hyeon Byeon) ²iD 염찬욱 (Chan-Uk Yeom) ³iD 곽근창 (Keun-Chang Kwak) ^†iD

(K-THEBOM Research Institute, Ltd. , Korea.)
(Chosun University IT Research Center)
(Dept. of Control and Instrumentation Engineering, Chosun University, Korea.)

^†Electrical engineering Group at KHNP Central Research Institute, Daejeon, Korea. E-mail: ksyong11@khnp.co.kr

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.kiee.or.kr).

Abstract

This paper describes the design of ensemble deep models based on time-scale transformation from electrocardiogram (ECG) signals for emotion recognition. As the number of senior citizens living alone increases, emotion robots that can interact with them and other emotion robots are becoming increasingly important. Existing emotion robots usually recognized emotion through images of the user’s facial expressions or voice signals. However, there are many situations where the user’s emotions cannot read under various environments. Therefore, research on recognizing the user’s emotions through ECG signals among different biomedical signals is actively being conducted. The proposed method converts ECG signals into various types of two-dimensional time-scale representations. We then designed a four-stream deep learning model by applying it to an ensemble form and transfer learning. Finally, an experiment was conducted using the ASCERTAIN sentiment database. This database contains data recorded by 58 people with 9 different emotions. Among these emotions, we used six representatives (surprise, happiness, anger, disgust, fear, and sadness). The experimental results revealed that the presented ensemble deep models showed good performance in comparison with each single deep model and the original model without transformation.

Key words

Ensemble Deep Models, Emotion Recognition, ASCERTAIN Sentiment Database, Electrocardiogram, Two-Dimensional Time-Scale Representation

1. Introduction

Social robots such as Sophia and Pepper have recently become a hot topic. A social robot refers to an emotion- oriented robot that has the ability to communicate with humans and operates autonomously to perform social actions, unlike conventional robots that have replaced human physical labor through mechanical movements. One of the main functions of a social robot is emotional interactions, which allows the robot to identify a person’s emotional state through natural conversations and convey the robot’s own emotions. With the development of sensing technology and machine learning, technologies capable of recognizing human emotions and inner states through images, audio, and bio-signals are being developed and applied ^(1-⁵⁾.

First, looking at image sensing-based emotion technology, Li ⁽¹⁾ proposed a facial recognition method that has been recognized as an active subject in the computer vision community. Schurgin ⁽²⁾ proposed the use of face emotion recognition, in which some areas of the face may contain more useful information than others. Halder ⁽³⁾ proposed the idea that the facial expressions of people with similar feelings are not always unique. Adeyanju ⁽⁴⁾ proposed facial emotional recognition systems to identify emotions expressed on the face without necessarily identifying the person involved, similar to facial recognition. In addition, Anderson ⁽⁵⁾ found that women in a control group were more accurate in recognizing emotions than men owing to a greater discriminative ability. Computer vision technology is essential for allowing social robots to recognize human emotions.

By contrast, with regard to voice-information-based emotion recognition, techniques for identifying emotional states by analyzing patterns such as tremors in the human voice are being studied. Schuller ⁽⁶⁾ proposed a speech emotion recognition system using continuous hidden markov models (HMMs). Schuller ⁽⁷⁾ also proposed a new approach to combining acoustic functions and language information to automatically and automatically recognize the speaker’s emotions most strongly. Han ⁽⁸⁾ proposed the use of speech emotion recognition because it is unclear what features are effective against a particular task. Wang ⁽⁹⁾ recently conducted a study on harmonic functions of speech emotion recognition. In addition, Jain ⁽¹⁰⁾ proposed the classification of speech based on one of four emotions, namely, sadness, anger, fear, or happiness.

With the development of wearable device technology, research on bio-signals is active. Such bio-signals are being studied in the field of user recognition. Antia ^(11,¹²⁾ presented a human recognition system through a single lead wire and used electrical signals recorded on an electrocardiogram (ECG) and an electroencephalogram (EEG) as unique characteristics of the individual information for biometric recognition. Su ⁽¹³⁾ fused the specified pulse with the ECG signals using a multi-mode biometric method based on discriminant correlation. Dhouha ⁽¹⁴⁾ used ECG signals for personal identification based on a support vector machine (SVM). Barra ⁽¹⁵⁾ fused simple features extracted simultaneously from EEG and ECG signals and used them for user recognition. Xunde ⁽¹⁶⁾ proposed deterministic learning to model the mechanics of training in ECG signals using a recently developed machine learning approach. Sumair ⁽¹⁷⁾ extracted and used the region of interest of the ECG signals to the maximum characteristic information related to the subject’s recognition through empirical mode decomposition (EMD). Joao ^(18,¹⁹⁾ argues that faces and fingerprints are currently the most thoroughly investigated biometric characteristics and promise reliable recognition of various applications. Maiorana ^(20-²²⁾ conducted a variety of experiments using differential features of the EEG signals over a long period of time, and longitudinal behavior of differentiated EEG characteristics is evaluated by statistical and performance-related analysis using different EEG characteristics and hidden Markov models as classifiers.

In addition, bio-signals are being actively researched in emotion recognition. Bio-signals such as an EEG, electromyogram (EMG), and ECG are used as important information in determining the psychological state of humans. Among them, we focus on the use of ECG signals. Nikolova ⁽²³⁾ is an ECG based affective computing is a new research field that aims to find to correlates between human emotions and the registered ECG signals. Study the potential for two machine learning methods (Logistic Regression and Artificial Neural Network) to discriminate human emotional states across multiple subjects. Ferdinando ⁽²⁴⁾ proposed new features for emotion recognition from short ECG signals. The features also showed better performance compared to features based on statistical distribution of instantaneous frequency, calculated using Hilbert transform from intrinsic mode function after applying standard empirical mode decomposition and bivariate empirical mode decomposition to ECG. Goshvarpour ⁽²⁵⁾, there are studying that use ECG signals and other biological signals together. The purpose of the current study was to examine the effectiveness of Matching Pursuit algorithm in emotion recognition. Using heart rate variability, empirical mode decomposition, with-in beat analysis and frequency spectrum analysis, Dissanayake ⁽²⁶⁾ performs ensemble learning after function extraction. Sarkar ⁽²⁷⁾ proposed ECG-base emotion recognition system using self-supervised learning. The unlabelled data are used to successfully train the former network to detect specific pre-determined signal transformations in the self-supervised learning step. Next, the weights of the convolutional layers of this network are transferred to the emotion recognition network, and two dense layers are trained in order to classify arousal and valence scores. Xu ⁽²⁸⁾ proposed emotion recognition from ECG signals to become an important research topic in the field of emotional computing. Moreover, Hsu ⁽²⁹⁾ also proposed an automatic ECG based emotion recognition algorithm for human emotion recognition. Agrafioti ⁽³⁰⁾ proposed that emotion modeling and recognition have attracted extensive attention in areas such as psychology, cognitive science, and engineering. Tivatansakul ⁽³¹⁾ proposed an emotion focused healthcare system to cope with negative emotions in everyday life. In addition, Jerritta ⁽³²⁾ proposed that emotional recognition using physiological signals is one of the key areas of human- computer interaction.

Emotion recognition studies using biometric facial and voice information have recently been conducted on a large dataset. Deep learning is effective against handling large datasets, and significant research has been ongoing in recent years. A video based emotional recognition system submitted to the Emotion Recognition in the Wild 2016 challenge was proposed by Fan ⁽³³⁾. The system’s core module is a hybrid network that combines a recurrent neural network and three-dimensional convolutional networks through a post-convergence method. Ng ⁽³⁴⁾ suggested that the purpose of this sub-task is to classify emotions expressed by the major human subject from static images extracted from movies. Fayek ⁽³⁵⁾ proposed that speech emotion recognition could be considered a static or dynamic classification issue, making it a great testbed for investigating and comparing various deep learning architect.

2. Basic Characteristics of ECG and 2D Time-Scale Representations

Physiologic signals are often abnormal and frequency components change over time. The wavelet breaks down signals into time-varying frequency or scale components. Signal characteristics are often limited to time and frequency domains, making it easier to analyze and estimate when working with sparse expressions.

2.1 ECG Signal Specificity

Electrocardiography is amplified and records the rhythm of the heart. Fig 1 is a voltage-time graph of the electrical activity of the heart using electrodes placed on the skin. These electrodes detect small electrical changes that are the result of cardiac muscle depolarization followed by repolarization during each cardiac cycle. Changes in the normal ECG pattern occur in numerous cardiac abnormalities, including arrhythmia, ischemic heart disease, and cardiomegaly.

The P wave indicates the contraction of the atrium and contracts in the order of the right and left atrium. The moment the electricity generated in the frozen node falls on the first cell of the right atrium, a P wave appears, and the time that the electricity reaches the last cell in the left atrium corresponds to the end of the P wave. This interval is called the width of the P wave and the time is 100 milliseconds. Due to its very small size, atrial repolarization is usually not visible on the ECG signal. Ventricular depolarization occurs from the endocardium toward the epicardium and corresponds to the QRS complex on the electrocardiogram. Ventricular depolarization occurs quickly and takes 80 milliseconds. The T wave refers to a gentle small wave that follows the QRS complex and is generated by repolarization of the ventricular muscle. In normal people, the P-wave or QRS complex, and the T-wave are directed in the same direction. There is no specific waveform between the QRS complex and the T-wave. This part corresponds to the ST segment, and the rise and fall of this part affect the diagnosis of ischemic heart disease. Each part of the ECG signal can be analyzed to determine whether it is normal or not.

Fig. 1. ECG signal with P, QRS, and T waves

2.2 2-D Time-Frequency Transformation

2.2.1 2-D Wavelet Scalogram

The scalogram is an absolute value of the signal’s continuous wavelet conversion (CWT) as a function of time and frequency. Scalogram can be more useful than spectra to analyze real signals that are characterized by different scales, such as slow-changing signals that are marked by sudden transient events. We use a scalogram to get better time localization for short-term, high-frequency events, and better frequency localization for low-frequency, longer-term events. The spectrogram is obtained by windowing the input signal with a window of constant length (duration) that is shifted in the time and frequency domains (see the Spectrogram Computation in Signal Analyzer for more information). The window used in the spectrogram is even, real-valued, and does not oscillate. Because the spectrogram uses a constant window, the time-frequency resolution of the spectrogram is fixed.

The Meyer wavelet and scaling functions are defined within the frequency domain:

For the wavelet function,

(1)

$\hat\psi(\omega)=(2\pi)^{-1/2}e^{iw/2}\sin\left(\dfrac{\pi}{2}v\left(\dfrac{3}{2\pi}| w | -1\right)\right){if}\dfrac{2\pi}{3}\le |\omega |\le\dfrac{4\pi}{3}$

(2)

$\hat\psi(\omega)=(2\pi)^{-1/2}e^{iw/2}\cos\left(\dfrac{\pi}{2}v\left(\dfrac{3}{4\pi}| w | -1\right)\right){if}\dfrac{4\pi}{3}\le |\omega |\le\dfrac{8\pi}{3}$

And $\hat\psi(\omega)= 0$ if $|\omega |\notin\left[\dfrac{2\pi}{3},\:\dfrac{8\pi}{3}\right]$

Where $v(a)= a^{4}\left(35 - 84a + 70a^{2}- 20a^{3}\right),\: a\in[0,\: 1].$ In addition

(3)

$\hat\psi(\omega)=(2\pi)^{-1/2}{if}|\omega |\le\dfrac{2\pi}{3}$

(4)

$\hat\psi(\omega)=(2\pi)^{-1/2}\cos\left(\dfrac{\pi}{2}\left(\dfrac{3}{2\pi}| w | - 1\right)\right){if}\dfrac{2\pi}{3}\le |\omega |\le\dfrac{4\pi}{3}$

(5)

$\hat\psi(\omega)= 0{if}|\omega | =\dfrac{4\pi}{3}$

Change the auxiliary function to obtains a different family of wavelets. This wavelet ensures orthogonal analysis.

The function $\psi$ has no finite support, but $\psi$ decreases to 0 when $x arrow\infty$, which is faster than the inverse polynomial: $\forall n\in N\ni C_{n}$ such that $|\psi(x)|\le C_{n}\left(1 + | x |^{2}\right)^{-n}$.

This property also holds for the derivatives, $\forall k\in N\ni C_{k,\:n}$, such that $\left |\psi^{(k)}x\right |\le C_{k,\:n}\left(1 + | x |^{2}\right)^{-n}$. The wavelet is infinitely differentiable.

2.2.2 Body Complex Continuous Wavelet

By contrast, the CWT is obtained by windowing the signal with a wavelet that is scaled and shifted in time. The wavelet oscillates and can be complex in value. The scaling and shifting operations are applied to a prototype wavelet. The scaling used in the CWT both shrinks and stretches the prototype wavelet. Shrinking the prototype wavelet yields short-duration, high-frequency wavelets that are good at detecting transient events. Stretching the prototype wavelet yields long-duration, low-frequency wavelets that are good at isolating long-duration, low-frequency events. To compute the scalogram, the signal analyzer applies the following steps:

1. If the signal has more than one million samples, the signal is divided into overlapping segments.

2. Compute the CWT of each segment to obtain its scalogram.

3. Display the scalogram segment by segment.

As implemented, the CWT uses L1 normalization. Therefore, the amplitude of the oscillation component in the signal corresponds to the amplitude of the corresponding wavelet coefficient.

By contrast, the CWT is obtained by windowing the signal with a wavelet that is scaled and shifted in time.

The wavelet family has several types of wavelets. Fig 2 shows an image of the four wavelets used in this study. First, as shown in Fig 2(a) the Daubechies wavelet ⁽⁴⁷⁾, developed by Daubechies, a renowned scholar in the field of wavelet research, provides compact support for orthogonal wavelets, enabling discrete wavelet analysis. The Daubechies family of wavelets is written as dbN, where N is the order and db is the name of the wavelet. We use db4 here. These wavelets have no explicit expression except for Haar wavelet in db1. However, the square coefficient of the h transfer function is explicit and very simple. $\psi$ and $\Phi$ support length is 2N-1. The number of moments of $\psi$ is N. Most dbN are not symmetrical. A Haar wavelet ⁽⁴⁸⁾ is discontinuous and similar to the step function. Daubechies indicates the same wavelet db1. As shown in Fig 2(c), the Meyer wavelet ⁽⁴⁹⁾ and the scaling function are defined within the frequency domain. Starting with the explicit format of fourier transform $\Phi$ the values of $\Phi$ are calculated on the normal grid, and then the value of $\Phi$ is calculated using the inverse non-standard discrete FFT, instdfft. As shown in Fig 2(d), this Morlet wavelet ⁽⁵⁰⁾ has no scaling function but is explicit. The constant C is used for normalization in terms of reconstruction. The analysis is not orthogonal because function $\Phi$ does not exist.

Fig. 2. (a) Feature extraction through a complex continuous Daubechies wavelet

Fig. 2. (b) Feature extraction through a complex continuous Haar wavelet

Fig. 2. (c) Feature extraction through a complex continuous Meyer wavelet

Fig. 2. (d) Feature extraction through a complex continuous Morlet wavelet

3. Transfer Learning

Transfer learning is a popular methodology in computer vision because it can achieve a high level of accuracy within a relatively short period of time ⁽⁴⁹⁾. By using transfer learning, even when solving a problem different from that previously learned, instead of stacking the model from the bottom, it can be applied using the already learned patterns. In computer vision, transfer learning mainly involves using pre-trained models. A pre-trained model is one that is already trained with large-sized data similar to the problem to be solved. Training a model with a large amount of data requires a long time and significant computational power, and thus it is customary to import and use models that have already been released. Some pre-training models used for transfer learning to have a large CNN structure. CNNs have been shown to perform well on a variety of computer vision problems. The enthusiasm regarding the use of CNNs in recent years has been driven by two main factors, a good performance and easy learning at the same time. A typical CNN consists of two parts. The first part is where the composite product layer and pooling layer are stacked in multiple layers. The goal of a convolutional base is to effectively extract features from the image. It consists mainly of a fully connected layer. A fully connected layer is a layer in which all neurons are connected to the output node of the previous layer. Second, the final goal of the classifier is to classify images into appropriate categories by learning the extracted features well. One of the important characteristics of deep learning models is that they learn hierarchical features themselves. To learn hierarchical features, the first layer of the model is trained to extract general features, whereas closer to the last layer of the model, such features can only appear in a particular dataset or problem. This means that advanced learning is conducted to extract specific features. Therefore, the layers at the front end can be reused when learning images from different datasets, but the layers at the back end need to be learned anew each time they encounter a new problem. By conclusion, the convolutional base part of the CNN model we looked at, particularly the lower level hierarchies, will extract general features, and by contrast, the higher-level hierarchies and classifier parts of the convolutional base will extract more specific and unique features.

3.1 ResNet

The deep layer of the neural network causes the gradient to disappear, and the gradient to explode and performance to deteriorate. Lost gradient means that the propagated gradient becomes too small, and explosion gradient indicates that the propagated gradient is too large to learn. Degraded performance indicates that the deep neural network is worse than the shallow neural network, despite the lack of overload sum. ResNet attempts to resolve these issues by reusing the input capabilities of the previous layer.

H(x) is considered the default mapping for several stack layers, and x represents the input for the first layer of these layers. If you hypothesize that multiple nonlinear layers can approximate complex functions point-by-point, this is the same hypothesis that residual functions, H(x) - x, can be approximated point-by-point. Therefore, instead of expecting stacked layers to be close to H(x), explicitly allow them to be close to the residual function F(x): = H(x) − x. Therefore, the original function is F(x) + x. Both formats should be able to approximate the desired functions gradually, but the ease of learning might be different ⁽⁵¹⁾.

3.2 Ensemble Model

The ensemble model is a learning model that increases prediction performance results by generating several classifiers and combining their predictions. Rather than using a strong model, the weaker model is combined to produce more accurate prediction results. Generally, bagging and boosting methods are used. The bagging method is a method of accumulating the results by learning each model by taking samples several times using the decision tree learning method or the random forest method. The boosting method is a method that learns sequentially and performs reconstruction random sampling in the same way as bagging, but it is a method that focuses more on the incorrect answer because it gives a high weight to the incorrect answer by assigning weight and gives a lower weight to the correct answer. The method used in this paper aims to improve performance by simply combining the same algorithm.

In the case of an existing ensemble network, the same data or different types of data are output through different networks. However, our proposed ensemble network uses the same network, transforms the same data into four types of scalograms, and uses the image as input. The key is to use the same data in various ways to transform it, and here we are transforming it into a scalogram using wavelets. There are various methods used in wavelets. A wavelet is strong within the global region, and the wavelet is strong within the local region. If you create various data through these different wavelets and apply them as input to the same network, one can create a situation that can satisfy both global and local parts. Therefore, as shown in Fig 3, in the ASCERTAIN dataset, only lead II of the ECG data is used to make a scalogram. In addition, Db4 is made into a scalogram form using a Daubechies wavelet, and then placed as an input in the learning model; haar is a Haar wavelet, meyr is a Meyer wavelet, and morl is a Morlet wavelet. Each of the same ECG signals is transformed into a scalogram form through four wavelet methods and input into the same network. In each learning model, a softmax function, which is a complete connection layer, comes out and fuses the parts to obtain the final output.

Fig. 3. Proposed ensemble deep models

3.3 ASCERTAIN Database

The database used in this study is ASCERTAIN, a multi- modal database that uses implicit nature and commercial physiological sensors to influence recognition. ASCERTAIN includes five personality scales and emotional self-assessment of 58 users, along with electroencephalogram (EEG), electrocardiogram (ECG), galvanic skin response (GSR), and facial activity data, recorded using off-the-shelf sensors while watching emotional movie clips.

A total of 58 university students participated in the study. All subjects are fluent in English and are habitual watch Hollywood movies. One PC with two monitors was used in the experiment. One monitor was used to display video clips with a pixel resolution of 1,024 × 768 at a 60 Hz screen playback rate and was placed approximately 1 m in front of the user. On other monitors, the experimenter could check the recorded sensor data. Physiological sensors have been placed on the user body according to prior consent. The GSR sensor was tied to the left wrist and two electrodes were secured to the index finger and the stop a phalangeal joint. Two ECG measurement electrodes were placed at each bend of the arm and the reference electrode was placed at the left foot. One dry electrode EEG device is placed on the head like a normal headset, the EEG sensor touches the forehead and the reference electrode is cut off in the left ear. EEG data samples were recorded using Lucid Scribe software and all sensor data was recorded using Bluetooth. A webcam was used to record facial activity. In this experimental environment, databases were built.

Fig. 4. Data structure diagram

4. Experimental Results

The computer used in the experiment had the following specifications: an Intel i7-8700K central processing unit at 3.70 GHz, an Nvidia GeForce GTX 1080, 64 GB of random-access memory, and a Windows 10 64-bit operating system.

The CNN model used in this study for emotion recognition is ResNet18, a pre-trained CNN. If new data need to be efficiently classified, pre-trained CNNs can be used for transfer learning. Most parameters used to extract feature maps are fixed, and only the final FC layer and softmax are learned.

For a performance comparison, a simple CNN was added to the experiment. Here, the number of emotions is the number of classes. The emotions defined were surprise, happiness, anger, disgust, fear, and sadness. A total of six classes were used. The training, validation, and testing datasets for the six classes consisted of 1260 data points each. Fig 5 shows the results of the four transformed methods and the results obtained after the ensemble operation. Fig 5(a) is the sum operation, and Fig 5(b) shows the multiplication operation. Fig 6 shows the confusion matrix for ResNet18 with scalogram preprocessing. As mentioned earlier, different ensemble networks applied the same data using different models, but here we transformed the data differently in the same learning model. Through the four wavelet functions, the data that complement each other by transforming the same ECG signal are completed, and only the final Fc layer and softmax that are applied to the same learning model are fused.

Table 1 and Table 2 show the Daubechies4, Haar, Meyer, and Morlet wavelet transform of 1D ECG signals in VGG19 model and ResNet18, and the recognition rates of Jun ⁽⁵²⁾ method and the proposed method by converting ECG signals into 2D images. As shown in Table 1, the Daubechies4, Haar, Meyer, and Morlet wavelet transform, which is the method proposed in this paper, is performed rather than the method using wavelet transform or 2D ECG images as input to the general VGG19 model, and the transformed 2D image is obtained. It can be seen that the method of putting the ensemble model as an input and multiplying each output to obtain the final output shows the best recognition rate. As a result of the experiment, it was confirmed that it is more effective to use the different inputs obtained through the wavelet transform as the input of the ensemble model rather than using each wavelet transform individually.

As listed in Table 1 and Table 2, the experimental results obtained by the proposed method showed good performance in comparison to each single model and Jun’s method ⁽⁵²⁾ without wavelet transformation. Here, Jun ⁽⁵²⁾ proposed 2D CNN by converting the ECG signal into only 2D ECG image without transformation. Although Jun’s method showed the spatial locality of the ECG images, it is advantageous to use the ensemble computing technique synergistically rather than exclusively in confronting ECG-based emotion recognition problems. Nevertheless, the disadvantage of this study is to increase the model complexity and learning time due to the design of ensemble model.

Fig. 5. Confusion matrix for ResNet18 with scalogram: (a) addition and (b) multiplication

Fig. 6. Confusion matrix for ResNet18 with scalogram: (a) Daubechies, (b) Haar, (c) Meyer, and (d) Morlet

Table 1. Performance comparison of VGG19 and the proposed method

Models	Methods	Recognition ratio(%)
VGG19	Jun ⁽⁵²⁾	91.11
	Daubechies4	87.62
	Haar	95.24
	Meyer	78.57
	Morlet	92.06
Proposed Model	Addition	97.78
Proposed Model	Multiplication	98.10

Table 2. Performance comparison of ResNet18 and the proposed method

Models	Methods	Recognition ratio(%)
ResNet18	Jun ⁽⁵²⁾	94.13
	Daubechies4	88.21
	Haar	95.92
	Meyer	95.71
	Morlet	95.40
Proposed Model	Addition	97.14
Proposed Model	Multiplication	98.41

5. Conclusion

We present an ensemble deep model designed using pre-trained CNNs and various types of preprocessing for emotional recognition in various environments. The CNNs used in this study were applied using ResNet18, which is frequently utilized in conjunction with deep learning. The experiment was conducted using the ASCERTAIN sentiment database. This database is made up of data recorded by 58 people while viewing images stimulating 36 emotions. Of these, only data indicating six emotions were used in the experiment. The main aspect of this study is first transforming the data into four types to improve the performance of the emotion classification with a small dataset. Second, as shown in Fig 4, all data obtained through the first operation were trained on the same network and ensemble. As a result of the experiment, the system for converting the electrocardiogram signal into various types of wavelets and applied to the ensemble deep model shows a better performance than the basic CNN or ResNet18 method.

In this study, a scale-based transformation was attempted; in future studies, however, while attempting a frequency-based transformation, the size of the data and various emotions will be considered and applied experimentally.

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education(No. 2017R1A6 A1A03015496). This research was supported by Healthcare AI Convergence Research &Development Program through the National IT Industry Promotion Agency of Korea(NIPA) funded by the Ministry of Science and ICT(No. S1601-20- 1041).

References

J. Li, M. Oussalah, 2010, Automatic face emotion recognition system, 2010 IEEE 9th International Conference on Cybernetic Intelligent Systems, pp. 1-6

M. W. Schurgin, J. Nelson, S. Iida, H. Ohira, J. Y. Chiao, S.L. Franconeri, 2014, Eye movements during emotion recognition in faces, Journal of Vision, Vol. 14, pp. 1-16

A. Halder, A. Konar, R. Mandal, A. Chakraborty, P. Bhowmik, N. R. Pal, A.K. Nagar, 2013, General and interval type-2 fuzzy face-space approach to emotion recognition., IEEE Transactions on Systems, Vol. 43, pp. 587-605

I. A. Adeyanju, E. O. Omidiora, O. F. Oyedokun, 2015, Performance evaluation of different support vector machine kernels for face emotion recognition, SAI Intelligent Systems Conference, pp. 804-806

I. M. Anderson, C. Shippen, G. Juhasz, D. Chase, E. Thomas, D. Downey, Z. G. Toth, K. LIoyd-Williams, R. Elliott, J. F. W. Deakin, 2011, State-dependent alteration in face emotion recognition in depression, The British Journal of Psychiatry, Vol. 198, pp. 302-308

B. Schuller, G. Rigoll, M. Lang, 2003, Hidden markov model-based speech emotion recognition., 2003 IEEE International Conference on Acoustics, pp. 1-4

B. Schuller, G. Rigoll, M. Lang, 2004, Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture, ICASSP, pp. 557-580

K. Han, D. Yu, I. Tashev, 2014, Speech emotion recognition using deep neural network and extreme learning machine, INTERSPEECH, pp. 223-227

K. Wang, N. An, B. N. Li, Y. Zhang, 2015, Speech emotion recognition using fourier parameters., IEEE Transactions on Affective Computing, Vol. 6, pp. 69-75

M. Jain, S. Narayan, P. Balaji, B. K. P, A. Bhowmick, K. R, 2002, Speech emotion recognition using support vector machine, Electrical Engineering and Systems Science

A. Pal, Y. N. Singh, 2018, ECG biometric recognition, ICMC2018 Mathematics and Computing, pp. 61-73

A. Pal, A. K. Gautam, N. S. Yogendra, 2015, Evaluation of bioelectric signals for human recognition, Procedia Computer Science, pp. 746-752

S. Kun, G. Yang, B. Wu, L. Yang, D. Li, P. Su, Y. Yin, 2019, Human identification using finger vein and ECG signals., Neurocomputing, pp. 111-118

D. Rezqui, Z. Lachiri, 2016, ECG biometric recognition using SVM‐based approach, Transactions on Electrical and Electronic Engineering, pp. 94-100

S. Barra, A. Casanova, M. Frashini, M. Nappi, 2015, EEG/ECG signal fusion aimed at biometric recognition, ICIAP 2015: New Trends in Image Analysis and Processing, pp. 35-42

X. Dong, W. Si, W. Huang, 2018, ECG-based identity recognition via deterministic learning, Biotechnology & Biotechnological Equipment, pp. 769-777

S. Aziz, M. U. Khan, Z. A. Choudhry, A. Aymin, A. Usman, 2019, ECG-based biometric authentication using empirical mode decomposition and support vector machines, 2019 IEEE 10th Annual Information Technology, pp. 906-912

J. R. Pinto, J. S. Cardoso, A. Lourenco, 2018, Evolution, Current Challenges, and future possibilities in ECG biometrics, IEEE ACCESS, pp. 34746-34776

J. R. Pinto, J. S. Cardoso, A. Lourenco, C. Carreiras, 2017, Towards a continuous biometric system based on ECG signals acquired on the steering wheel, Sensors, pp. 1-14

E. Maiorana, P. Campisi, 2017, Longitudinal evaluation of EEG-based biometric recognition, IEEE Transactions on Information Forensics and Security, pp. 1123-1138

E. Maiorana, J. Sole-Casals, P. Campisi, 2016, EEG signal preprocessing for biometric recognition, Machine Vision and Applications, pp. 1351-1360

E. Maiorana, D. L. Rocca, P. Campisi, 2015, Cognitive biometric cryptosystems a case study on EEG, 2015 International Conference on Systems, pp. 125-128

D. Nikolova, P. Mihaylova, A. Manolova, P. Georgieva, 2019, ECG-based human emotion recognition across multiple subjects, FABULOUS 2019, pp. 25-36

H. Ferdinando, T. Seppanen, E. Alasaarela, 2016, Comparing features from ECG pattern and HRV analysis for emotion recognition system, CIBCB 2016, pp. 1-6

A. Goshvarpour, A. Abbasi, A. Goshvarpour, 2017, An accurate emotion recognition system using ECG and GSR signals and matching pursuit method, Biomedical Journal, Vol. 40, pp. 355-368

T. Dissanayake, Y. Rajapaksha, R. Ragel, I. Nawinne, 2019, An ensemble learning approach for electrocardiogram sensor based human emotion recognition, Sensors, Vol. 19, pp. 1-24

P. Sarkar, A. Etemad, 2020, Self-supervised learning for ECG-based emotion recognition, 45th IEEE International Conference on Acoustics Speech and Signal Processing, pp. 1-13

X. Ya, L. Guang-Yuan, 2009, A method of emotion recognition based on ECG signal, 2019 International Conference on Computational Intelligence and Natural Computing, pp. 202-205

Y. L. Hsu, J. S. Wang, W. C. Chiang, C. H. Hung, 2020, Automatic ECG-based emotion recognition in music listening, IEEE Transactions on Affective Computing, Vol. 11, pp. 85-99

F. Agrafioti, D. Hatzinakos, A. K. Anderson, 2012, ECG pattern analysis for emotion detection, IEEE Transactions on Affective Computing, Vol. 3, pp. 102-115

S. Tivatansakul, M. Ohkura, 2016, Emotion recognition using ECG signals with local pattern description methods, International Journal of Affective Engineering, Vol. 15, pp. 51-61

J. S, M. Murugappan, K. Wan, S. Yaacob, 2013, Emotion detection from QRS complex of ECG signals using hurst exponent for different age groups, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 849-854

Y. Fan, X. Lu, D. Li, Y. Liu, 2016, Video-based emotion recognition using CNN-RNN and C3D hybrid networks, ICMI’16, pp. 445-450

H. W. Ng, V. D. Nguyen, V. Vonikakis, S. Winkler, 2015, Deep learning for emotion recognition on small datasets using transfer learning, ICMI’15, pp. 443-449

H. M. Fayek, M. Lech, L. Cavedon, 2017, Evaluating deep learning architectures for speech emotion recognition, Neural Networks, Vol. 92, pp. 60-68

D. Issa, M. F. Demirci, A. Yazici, 2020, Speech emotion recognition with deep convolutional neural networks, Biomedical Signal Processing and Control, Vol. 59, pp. 1-11

S. Jirayucharoensak, S. Pan-Ngum, P. Israsena, 2014, EEG- based emotion recognition using deep learning network with principal component based covariate shift adaptation, The Scientific World Journal, pp. 1-10

P. Pandey, K. R. Seeja, 2019, Subject independent emotion recognition from EEG using VMD and deep learning, Journal of King Saud University-Computer and Information Sciences, Vol. 14, pp. 1-9

J. Wang, R. Li, R. Li, B. Fu, 2020, A knowledge-based deep learning method for ECG signal delineation, Future Generation Computer Systems, Vol. 109, pp. 56-66

G. Giannakakis, E. Trivizakis, M. Tsiknakis, K. Marias, 2019, A novel multi-kernel 1D convolutional neural network for stress recognition from ECG, 2019 8th International Conference on ACIIW, pp. 273-276

R. Subramanian, J. Wache, M. K. Abadi, R. L. Vieriu, S. Winkler, N. Sebe, 2018, ASCERTAIN: Emotion and personality recognition using commercial sensors, IEEE Transactions on Affective Computing, Vol. 9, pp. 147-160

S. Koelstra, C. Muhl, M. Soleymani, J.S. Lee, A. Yazdni, T. Ebrahimi, T. Pun, A. Nijholt, I. DEAP Patras, 2011, A database for emotion analysis using physiological signals, IEEE Transactions on Affective Computing, Vol. 3, pp. 18-31

M. K. Abadi, R. Subramanian, S. M. Kia, P. Avesani, I. Patras, N. Sebe, 2015, DECAF MEG-based multimodal database for decoding affective physiological responses, IEEE Transactions on Affective Computing, Vol. 6, pp. 209-222

S. Katsigiannis, N. Ramzan, 2018, DREAMER: A databases for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices, IEEE Journal of Biomedical and Health Informatics, Vol. 22, pp. 98-107

A. Radford, L. Metz, S. Chintala, 2016, Unsupervised representation learning with deep convolutional generative adversarial networks, ICLR 2016, pp. 1-16

H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, D. N. Metaxas, 2019, StackGAN++: Realistic image synthesis with stacked generative adversarial networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, pp. 1947-1962

A. C. H. Rowe, P. C. Abbott, 1995, Daubechies wavelets and mathematica, Computers in Physics, Vol. 635

D. K. Ruch, P. J. Van Fleet, 2009, Wavelet theory: An elementary approach with applications, Wiley

V. V. Vermehren, H. M. Oliveira, 2015, Close expression for meyer wavelet and scale function

A. Bernardino, J. Santos-Victor, 2005, A real-time gabor primal sketch for visual attention, IBPRIA, pp. 335-342

T. Mallick, P. Balaprakash, E. Rask, J. Macfarlane, 2020, Transfer learning with graph neural networks for short-term highway traffic forecasting, arXiv:2004.08038

T. J. Jun, H. M. Nguyen, D. Kang, D. Kim, Y. H. Kim, 2018, ECG arrhythmia classification using a 2-D convolutional neural network, Computer Vision and Pattern Recognition, pp. 1-22

저자소개

Myung-Won Lee

He received his PhD from Chosun University in Gwangju in 2017.

He worked as a postdoctoral researcher at Chosun University IT Research Center from 2017 to 2021.

He is currently conducting research at K-THEBOM's research institute. His interests are particle computing, emotion recognition, language processing, and disease classification.

Yeong-Hyeon Byeon

He received his Bachelor's degree from Chosun University in Gwangju in 2013.

He received his master's degree from Chosun University, Gwangju in 2014.

He received his PhD from Chosun University, Gwangju in 2021.

He is currently working as a postdoctoral researcher at the IT Research Center of Chosun University.

His interests are pedestrian detection and deep learning.

Chan-Uk Yeom

He received his Bachelor's degree from Chosun University in Gwangju in 2016.

He received his master's degree from Chosun University in Gwangju in 2017.

He is currently pursuing his PhD at Chosun University in Gwangju.

His interests are particle computing and computational intelligence.

Keun-Chan Kwak

He received Ph.D. degree from Chungbuk National University in 2012.

He worked as a Postdoctoral Fellow at the University of Alberta, Canada from 2003 to 2005.

From 2005 to 2007, he worked as a senior researcher at Intelligent Robot Research Division in ETRI.

He is now a professor at the Department of Electronics Engineering at Chosun University.

His interest areas are computational intelligence, human-robot interaction, and biometrics.

KIEEThe Transactions P of
the Korean Institute of Electrical Engineers

Korean Journal of Air-Conditioning and Refrigeration Engineering

ISO Journal TitleTrans. P of KIEE

Journal Search

Journal XML

Journal Information

심전도신호로부터 시간-스케일 2차원 변환을 가진 딥러닝기반 감정인식

Abstract

Key words

1. Introduction

2. Basic Characteristics of ECG and 2D Time-Scale Representations

2.1 ECG Signal Specificity

2.2 2-D Time-Frequency Transformation

2.2.1 2-D Wavelet Scalogram

(1)

(2)

(3)

(4)

(5)

2.2.2 Body Complex Continuous Wavelet

3. Transfer Learning

3.1 ResNet

3.2 Ensemble Model

3.3 ASCERTAIN Database

4. Experimental Results

5. Conclusion

Acknowledgements

References

저자소개

Myung-Won Lee

Yeong-Hyeon Byeon

Chan-Uk Yeom

Keun-Chan Kwak

Article Information (continued)

Key words

KIEEThe Transactions P ofthe Korean Institute of Electrical Engineers

Korean Journal of Air-Conditioning and Refrigeration Engineering

ISO Journal TitleTrans. P of KIEE

Journal Search

Journal XML

Journal Information

심전도신호로부터 시간-스케일 2차원 변환을 가진 딥러닝기반 감정인식

Abstract

Key words

1. Introduction

2. Basic Characteristics of ECG and 2D Time-Scale Representations

2.1 ECG Signal Specificity

2.2 2-D Time-Frequency Transformation

2.2.1 2-D Wavelet Scalogram

(1)

(2)

(3)

(4)

(5)

2.2.2 Body Complex Continuous Wavelet

3. Transfer Learning

3.1 ResNet

3.2 Ensemble Model

3.3 ASCERTAIN Database

4. Experimental Results

5. Conclusion

Acknowledgements

References

저자소개

Myung-Won Lee

Yeong-Hyeon Byeon

Chan-Uk Yeom

Keun-Chan Kwak

Article Information (continued)

Key words

KIEEThe Transactions P of
the Korean Institute of Electrical Engineers