이명원
(Myung-Won Lee)
1iD
변영현
(Yeong-Hyeon Byeon)
2iD
염찬욱
(Chan-Uk Yeom)
3iD
곽근창
(Keun-Chang Kwak)
†iD
-
(K-THEBOM Research Institute, Ltd. , Korea.)
-
(Chosun University IT Research Center)
-
(Dept. of Control and Instrumentation Engineering, Chosun University, Korea.)
Copyright © The Korean Institute of Electrical Engineers(KIEE)
Key words
Ensemble Deep Models, Emotion Recognition, ASCERTAIN Sentiment Database, Electrocardiogram, Two-Dimensional Time-Scale Representation
1. Introduction
Social robots such as Sophia and Pepper have recently become a hot topic. A social
robot refers to an emotion- oriented robot that has the ability to communicate with
humans and operates autonomously to perform social actions, unlike conventional robots
that have replaced human physical labor through mechanical movements. One of the main
functions of a social robot is emotional interactions, which allows the robot to identify
a person’s emotional state through natural conversations and convey the robot’s own
emotions. With the development of sensing technology and machine learning, technologies
capable of recognizing human emotions and inner states through images, audio, and
bio-signals are being developed and applied (1-5).
First, looking at image sensing-based emotion technology, Li (1) proposed a facial recognition method that has been recognized as an active subject
in the computer vision community. Schurgin (2) proposed the use of face emotion recognition, in which some areas of the face may
contain more useful information than others. Halder (3) proposed the idea that the facial expressions of people with similar feelings are
not always unique. Adeyanju (4) proposed facial emotional recognition systems to identify emotions expressed on the
face without necessarily identifying the person involved, similar to facial recognition.
In addition, Anderson (5) found that women in a control group were more accurate in recognizing emotions than
men owing to a greater discriminative ability. Computer vision technology is essential
for allowing social robots to recognize human emotions.
By contrast, with regard to voice-information-based emotion recognition, techniques
for identifying emotional states by analyzing patterns such as tremors in the human
voice are being studied. Schuller (6) proposed a speech emotion recognition system using continuous hidden markov models
(HMMs). Schuller (7) also proposed a new approach to combining acoustic functions and language information
to automatically and automatically recognize the speaker’s emotions most strongly.
Han (8) proposed the use of speech emotion recognition because it is unclear what features
are effective against a particular task. Wang (9) recently conducted a study on harmonic functions of speech emotion recognition. In
addition, Jain (10) proposed the classification of speech based on one of four emotions, namely, sadness,
anger, fear, or happiness.
With the development of wearable device technology, research on bio-signals is active.
Such bio-signals are being studied in the field of user recognition. Antia (11,12) presented a human recognition system through a single lead wire and used electrical
signals recorded on an electrocardiogram (ECG) and an electroencephalogram (EEG) as
unique characteristics of the individual information for biometric recognition. Su
(13) fused the specified pulse with the ECG signals using a multi-mode biometric method
based on discriminant correlation. Dhouha (14) used ECG signals for personal identification based on a support vector machine (SVM).
Barra (15) fused simple features extracted simultaneously from EEG and ECG signals and used
them for user recognition. Xunde (16) proposed deterministic learning to model the mechanics of training in ECG signals
using a recently developed machine learning approach. Sumair (17) extracted and used the region of interest of the ECG signals to the maximum characteristic
information related to the subject’s recognition through empirical mode decomposition
(EMD). Joao (18,19) argues that faces and fingerprints are currently the most thoroughly investigated
biometric characteristics and promise reliable recognition of various applications.
Maiorana (20-22) conducted a variety of experiments using differential features of the EEG signals
over a long period of time, and longitudinal behavior of differentiated EEG characteristics
is evaluated by statistical and performance-related analysis using different EEG characteristics
and hidden Markov models as classifiers.
In addition, bio-signals are being actively researched in emotion recognition. Bio-signals
such as an EEG, electromyogram (EMG), and ECG are used as important information in
determining the psychological state of humans. Among them, we focus on the use of
ECG signals. Nikolova (23) is an ECG based affective computing is a new research field that aims to find to
correlates between human emotions and the registered ECG signals. Study the potential
for two machine learning methods (Logistic Regression and Artificial Neural Network)
to discriminate human emotional states across multiple subjects. Ferdinando (24) proposed new features for emotion recognition from short ECG signals. The features
also showed better performance compared to features based on statistical distribution
of instantaneous frequency, calculated using Hilbert transform from intrinsic mode
function after applying standard empirical mode decomposition and bivariate empirical
mode decomposition to ECG. Goshvarpour (25), there are studying that use ECG signals and other biological signals together. The
purpose of the current study was to examine the effectiveness of Matching Pursuit
algorithm in emotion recognition. Using heart rate variability, empirical mode decomposition,
with-in beat analysis and frequency spectrum analysis, Dissanayake (26) performs ensemble learning after function extraction. Sarkar (27) proposed ECG-base emotion recognition system using self-supervised learning. The
unlabelled data are used to successfully train the former network to detect specific
pre-determined signal transformations in the self-supervised learning step. Next,
the weights of the convolutional layers of this network are transferred to the emotion
recognition network, and two dense layers are trained in order to classify arousal
and valence scores. Xu (28) proposed emotion recognition from ECG signals to become an important research topic
in the field of emotional computing. Moreover, Hsu (29) also proposed an automatic ECG based emotion recognition algorithm for human emotion
recognition. Agrafioti (30) proposed that emotion modeling and recognition have attracted extensive attention
in areas such as psychology, cognitive science, and engineering. Tivatansakul (31) proposed an emotion focused healthcare system to cope with negative emotions in everyday
life. In addition, Jerritta (32) proposed that emotional recognition using physiological signals is one of the key
areas of human- computer interaction.
Emotion recognition studies using biometric facial and voice information have recently
been conducted on a large dataset. Deep learning is effective against handling large
datasets, and significant research has been ongoing in recent years. A video based
emotional recognition system submitted to the Emotion Recognition in the Wild 2016
challenge was proposed by Fan (33). The system’s core module is a hybrid network that combines a recurrent neural network
and three-dimensional convolutional networks through a post-convergence method. Ng
(34) suggested that the purpose of this sub-task is to classify emotions expressed by
the major human subject from static images extracted from movies. Fayek (35) proposed that speech emotion recognition could be considered a static or dynamic
classification issue, making it a great testbed for investigating and comparing various
deep learning architect.
2. Basic Characteristics of ECG and 2D Time-Scale Representations
Physiologic signals are often abnormal and frequency components change over time.
The wavelet breaks down signals into time-varying frequency or scale components. Signal
characteristics are often limited to time and frequency domains, making it easier
to analyze and estimate when working with sparse expressions.
2.1 ECG Signal Specificity
Electrocardiography is amplified and records the rhythm of the heart. Fig 1 is a voltage-time graph of the electrical activity of the heart using electrodes
placed on the skin. These electrodes detect small electrical changes that are the
result of cardiac muscle depolarization followed by repolarization during each cardiac
cycle. Changes in the normal ECG pattern occur in numerous cardiac abnormalities,
including arrhythmia, ischemic heart disease, and cardiomegaly.
The P wave indicates the contraction of the atrium and contracts in the order of the
right and left atrium. The moment the electricity generated in the frozen node falls
on the first cell of the right atrium, a P wave appears, and the time that the electricity
reaches the last cell in the left atrium corresponds to the end of the P wave. This
interval is called the width of the P wave and the time is 100 milliseconds. Due to
its very small size, atrial repolarization is usually not visible on the ECG signal.
Ventricular depolarization occurs from the endocardium toward the epicardium and corresponds
to the QRS complex on the electrocardiogram. Ventricular depolarization occurs quickly
and takes 80 milliseconds. The T wave refers to a gentle small wave that follows the
QRS complex and is generated by repolarization of the ventricular muscle. In normal
people, the P-wave or QRS complex, and the T-wave are directed in the same direction.
There is no specific waveform between the QRS complex and the T-wave. This part corresponds
to the ST segment, and the rise and fall of this part affect the diagnosis of ischemic
heart disease. Each part of the ECG signal can be analyzed to determine whether it
is normal or not.
Fig. 1. ECG signal with P, QRS, and T waves
2.2 2-D Time-Frequency Transformation
2.2.1 2-D Wavelet Scalogram
The scalogram is an absolute value of the signal’s continuous wavelet conversion (CWT)
as a function of time and frequency. Scalogram can be more useful than spectra to
analyze real signals that are characterized by different scales, such as slow-changing
signals that are marked by sudden transient events. We use a scalogram to get better
time localization for short-term, high-frequency events, and better frequency localization
for low-frequency, longer-term events. The spectrogram is obtained by windowing the
input signal with a window of constant length (duration) that is shifted in the time
and frequency domains (see the Spectrogram Computation in Signal Analyzer for more
information). The window used in the spectrogram is even, real-valued, and does not
oscillate. Because the spectrogram uses a constant window, the time-frequency resolution
of the spectrogram is fixed.
The Meyer wavelet and scaling functions are defined within the frequency domain:
For the wavelet function,
And $\hat\psi(\omega)= 0$ if $|\omega |\notin\left[\dfrac{2\pi}{3},\:\dfrac{8\pi}{3}\right]$
Where $v(a)= a^{4}\left(35 - 84a + 70a^{2}- 20a^{3}\right),\: a\in[0,\: 1].$ In addition
Change the auxiliary function to obtains a different family of wavelets. This wavelet
ensures orthogonal analysis.
The function $\psi$ has no finite support, but $\psi$ decreases to 0 when $x arrow\infty$,
which is faster than the inverse polynomial: $\forall n\in N\ni C_{n}$ such that $|\psi(x)|\le
C_{n}\left(1 + | x |^{2}\right)^{-n}$.
This property also holds for the derivatives, $\forall k\in N\ni C_{k,\:n}$, such
that $\left |\psi^{(k)}x\right |\le C_{k,\:n}\left(1 + | x |^{2}\right)^{-n}$. The
wavelet is infinitely differentiable.
2.2.2 Body Complex Continuous Wavelet
By contrast, the CWT is obtained by windowing the signal with a wavelet that is scaled
and shifted in time. The wavelet oscillates and can be complex in value. The scaling
and shifting operations are applied to a prototype wavelet. The scaling used in the
CWT both shrinks and stretches the prototype wavelet. Shrinking the prototype wavelet
yields short-duration, high-frequency wavelets that are good at detecting transient
events. Stretching the prototype wavelet yields long-duration, low-frequency wavelets
that are good at isolating long-duration, low-frequency events. To compute the scalogram,
the signal analyzer applies the following steps:
1. If the signal has more than one million samples, the signal is divided into overlapping
segments.
2. Compute the CWT of each segment to obtain its scalogram.
3. Display the scalogram segment by segment.
As implemented, the CWT uses L1 normalization. Therefore, the amplitude of the oscillation
component in the signal corresponds to the amplitude of the corresponding wavelet
coefficient.
By contrast, the CWT is obtained by windowing the signal with a wavelet that is scaled
and shifted in time.
The wavelet family has several types of wavelets. Fig 2 shows an image of the four wavelets used in this study. First, as shown in Fig 2(a) the Daubechies wavelet (47), developed by Daubechies, a renowned scholar in the field of wavelet research, provides
compact support for orthogonal wavelets, enabling discrete wavelet analysis. The Daubechies
family of wavelets is written as dbN, where N is the order and db is the name of the
wavelet. We use db4 here. These wavelets have no explicit expression except for Haar
wavelet in db1. However, the square coefficient of the h transfer function is explicit
and very simple. $\psi$ and $\Phi$ support length is 2N-1. The number of moments of
$\psi$ is N. Most dbN are not symmetrical. A Haar wavelet (48) is discontinuous and similar to the step function. Daubechies indicates the same
wavelet db1. As shown in Fig 2(c), the Meyer wavelet (49) and the scaling function are defined within the frequency domain. Starting with the
explicit format of fourier transform $\Phi$ the values of $\Phi$ are calculated on
the normal grid, and then the value of $\Phi$ is calculated using the inverse non-standard
discrete FFT, instdfft. As shown in Fig 2(d), this Morlet wavelet (50) has no scaling function but is explicit. The constant C is used for normalization
in terms of reconstruction. The analysis is not orthogonal because function $\Phi$
does not exist.
Fig. 2. (a) Feature extraction through a complex continuous Daubechies wavelet
Fig. 2. (b) Feature extraction through a complex continuous Haar wavelet
Fig. 2. (c) Feature extraction through a complex continuous Meyer wavelet
Fig. 2. (d) Feature extraction through a complex continuous Morlet wavelet
3. Transfer Learning
Transfer learning is a popular methodology in computer vision because it can achieve
a high level of accuracy within a relatively short period of time (49). By using transfer learning, even when solving a problem different from that previously
learned, instead of stacking the model from the bottom, it can be applied using the
already learned patterns. In computer vision, transfer learning mainly involves using
pre-trained models. A pre-trained model is one that is already trained with large-sized
data similar to the problem to be solved. Training a model with a large amount of
data requires a long time and significant computational power, and thus it is customary
to import and use models that have already been released. Some pre-training models
used for transfer learning to have a large CNN structure. CNNs have been shown to
perform well on a variety of computer vision problems. The enthusiasm regarding the
use of CNNs in recent years has been driven by two main factors, a good performance
and easy learning at the same time. A typical CNN consists of two parts. The first
part is where the composite product layer and pooling layer are stacked in multiple
layers. The goal of a convolutional base is to effectively extract features from the
image. It consists mainly of a fully connected layer. A fully connected layer is a
layer in which all neurons are connected to the output node of the previous layer.
Second, the final goal of the classifier is to classify images into appropriate categories
by learning the extracted features well. One of the important characteristics of deep
learning models is that they learn hierarchical features themselves. To learn hierarchical
features, the first layer of the model is trained to extract general features, whereas
closer to the last layer of the model, such features can only appear in a particular
dataset or problem. This means that advanced learning is conducted to extract specific
features. Therefore, the layers at the front end can be reused when learning images
from different datasets, but the layers at the back end need to be learned anew each
time they encounter a new problem. By conclusion, the convolutional base part of the
CNN model we looked at, particularly the lower level hierarchies, will extract general
features, and by contrast, the higher-level hierarchies and classifier parts of the
convolutional base will extract more specific and unique features.
3.1 ResNet
The deep layer of the neural network causes the gradient to disappear, and the gradient
to explode and performance to deteriorate. Lost gradient means that the propagated
gradient becomes too small, and explosion gradient indicates that the propagated gradient
is too large to learn. Degraded performance indicates that the deep neural network
is worse than the shallow neural network, despite the lack of overload sum. ResNet
attempts to resolve these issues by reusing the input capabilities of the previous
layer.
H(x) is considered the default mapping for several stack layers, and x represents
the input for the first layer of these layers. If you hypothesize that multiple nonlinear
layers can approximate complex functions point-by-point, this is the same hypothesis
that residual functions, H(x) - x, can be approximated point-by-point. Therefore,
instead of expecting stacked layers to be close to H(x), explicitly allow them to
be close to the residual function F(x): = H(x) − x. Therefore, the original function
is F(x) + x. Both formats should be able to approximate the desired functions gradually,
but the ease of learning might be different (51).
3.2 Ensemble Model
The ensemble model is a learning model that increases prediction performance results
by generating several classifiers and combining their predictions. Rather than using
a strong model, the weaker model is combined to produce more accurate prediction results.
Generally, bagging and boosting methods are used. The bagging method is a method of
accumulating the results by learning each model by taking samples several times using
the decision tree learning method or the random forest method. The boosting method
is a method that learns sequentially and performs reconstruction random sampling in
the same way as bagging, but it is a method that focuses more on the incorrect answer
because it gives a high weight to the incorrect answer by assigning weight and gives
a lower weight to the correct answer. The method used in this paper aims to improve
performance by simply combining the same algorithm.
In the case of an existing ensemble network, the same data or different types of data
are output through different networks. However, our proposed ensemble network uses
the same network, transforms the same data into four types of scalograms, and uses
the image as input. The key is to use the same data in various ways to transform it,
and here we are transforming it into a scalogram using wavelets. There are various
methods used in wavelets. A wavelet is strong within the global region, and the wavelet
is strong within the local region. If you create various data through these different
wavelets and apply them as input to the same network, one can create a situation that
can satisfy both global and local parts. Therefore, as shown in Fig 3, in the ASCERTAIN dataset, only lead II of the ECG data is used to make a scalogram.
In addition, Db4 is made into a scalogram form using a Daubechies wavelet, and then
placed as an input in the learning model; haar is a Haar wavelet, meyr is a Meyer
wavelet, and morl is a Morlet wavelet. Each of the same ECG signals is transformed
into a scalogram form through four wavelet methods and input into the same network.
In each learning model, a softmax function, which is a complete connection layer,
comes out and fuses the parts to obtain the final output.
Fig. 3. Proposed ensemble deep models
3.3 ASCERTAIN Database
The database used in this study is ASCERTAIN, a multi- modal database that uses implicit
nature and commercial physiological sensors to influence recognition. ASCERTAIN includes
five personality scales and emotional self-assessment of 58 users, along with electroencephalogram
(EEG), electrocardiogram (ECG), galvanic skin response (GSR), and facial activity
data, recorded using off-the-shelf sensors while watching emotional movie clips.
A total of 58 university students participated in the study. All subjects are fluent
in English and are habitual watch Hollywood movies. One PC with two monitors was used
in the experiment. One monitor was used to display video clips with a pixel resolution
of 1,024 × 768 at a 60 Hz screen playback rate and was placed approximately 1 m in
front of the user. On other monitors, the experimenter could check the recorded sensor
data. Physiological sensors have been placed on the user body according to prior consent.
The GSR sensor was tied to the left wrist and two electrodes were secured to the index
finger and the stop a phalangeal joint. Two ECG measurement electrodes were placed
at each bend of the arm and the reference electrode was placed at the left foot. One
dry electrode EEG device is placed on the head like a normal headset, the EEG sensor
touches the forehead and the reference electrode is cut off in the left ear. EEG data
samples were recorded using Lucid Scribe software and all sensor data was recorded
using Bluetooth. A webcam was used to record facial activity. In this experimental
environment, databases were built.
Fig. 4. Data structure diagram
4. Experimental Results
The computer used in the experiment had the following specifications: an Intel i7-8700K
central processing unit at 3.70 GHz, an Nvidia GeForce GTX 1080, 64 GB of random-access
memory, and a Windows 10 64-bit operating system.
The CNN model used in this study for emotion recognition is ResNet18, a pre-trained
CNN. If new data need to be efficiently classified, pre-trained CNNs can be used for
transfer learning. Most parameters used to extract feature maps are fixed, and only
the final FC layer and softmax are learned.
For a performance comparison, a simple CNN was added to the experiment. Here, the
number of emotions is the number of classes. The emotions defined were surprise, happiness,
anger, disgust, fear, and sadness. A total of six classes were used. The training,
validation, and testing datasets for the six classes consisted of 1260 data points
each. Fig 5 shows the results of the four transformed methods and the results obtained after
the ensemble operation. Fig 5(a) is the sum operation, and Fig 5(b) shows the multiplication operation. Fig 6 shows the confusion matrix for ResNet18 with scalogram preprocessing. As mentioned
earlier, different ensemble networks applied the same data using different models,
but here we transformed the data differently in the same learning model. Through the
four wavelet functions, the data that complement each other by transforming the same
ECG signal are completed, and only the final Fc layer and softmax that are applied
to the same learning model are fused.
Table 1 and Table 2 show the Daubechies4, Haar, Meyer, and Morlet wavelet transform of 1D ECG signals
in VGG19 model and ResNet18, and the recognition rates of Jun (52) method and the proposed method by converting ECG signals into 2D images. As shown
in Table 1, the Daubechies4, Haar, Meyer, and Morlet wavelet transform, which is the method
proposed in this paper, is performed rather than the method using wavelet transform
or 2D ECG images as input to the general VGG19 model, and the transformed 2D image
is obtained. It can be seen that the method of putting the ensemble model as an input
and multiplying each output to obtain the final output shows the best recognition
rate. As a result of the experiment, it was confirmed that it is more effective to
use the different inputs obtained through the wavelet transform as the input of the
ensemble model rather than using each wavelet transform individually.
As listed in Table 1 and Table 2, the experimental results obtained by the proposed method showed good performance
in comparison to each single model and Jun’s method (52) without wavelet transformation. Here, Jun (52) proposed 2D CNN by converting the ECG signal into only 2D ECG image without transformation.
Although Jun’s method showed the spatial locality of the ECG images, it is advantageous
to use the ensemble computing technique synergistically rather than exclusively in
confronting ECG-based emotion recognition problems. Nevertheless, the disadvantage
of this study is to increase the model complexity and learning time due to the design
of ensemble model.
Fig. 5. Confusion matrix for ResNet18 with scalogram: (a) addition and (b) multiplication
Fig. 6. Confusion matrix for ResNet18 with scalogram: (a) Daubechies, (b) Haar, (c)
Meyer, and (d) Morlet
Table 1. Performance comparison of VGG19 and the proposed method
Models
|
Methods
|
Recognition ratio(%)
|
VGG19
|
Jun (52)
|
91.11
|
Daubechies4
|
87.62
|
Haar
|
95.24
|
Meyer
|
78.57
|
Morlet
|
92.06
|
Proposed Model
|
Addition
|
97.78
|
Multiplication
|
98.10
|
Table 2. Performance comparison of ResNet18 and the proposed method
Models
|
Methods
|
Recognition ratio(%)
|
ResNet18
|
Jun (52)
|
94.13
|
Daubechies4
|
88.21
|
Haar
|
95.92
|
Meyer
|
95.71
|
Morlet
|
95.40
|
Proposed Model
|
Addition
|
97.14
|
Multiplication
|
98.41
|
5. Conclusion
We present an ensemble deep model designed using pre-trained CNNs and various types
of preprocessing for emotional recognition in various environments. The CNNs used
in this study were applied using ResNet18, which is frequently utilized in conjunction
with deep learning. The experiment was conducted using the ASCERTAIN sentiment database.
This database is made up of data recorded by 58 people while viewing images stimulating
36 emotions. Of these, only data indicating six emotions were used in the experiment.
The main aspect of this study is first transforming the data into four types to improve
the performance of the emotion classification with a small dataset. Second, as shown
in Fig 4, all data obtained through the first operation were trained on the same network and
ensemble. As a result of the experiment, the system for converting the electrocardiogram
signal into various types of wavelets and applied to the ensemble deep model shows
a better performance than the basic CNN or ResNet18 method.
In this study, a scale-based transformation was attempted; in future studies, however,
while attempting a frequency-based transformation, the size of the data and various
emotions will be considered and applied experimentally.
Acknowledgements
This research was supported by Basic Science Research Program through the National
Research Foundation of Korea (NRF) funded by the Ministry of Education(No. 2017R1A6
A1A03015496). This research was supported by Healthcare AI Convergence Research &Development
Program through the National IT Industry Promotion Agency of Korea(NIPA) funded by
the Ministry of Science and ICT(No. S1601-20- 1041).
References
J. Li, M. Oussalah, 2010, Automatic face emotion recognition system, 2010 IEEE 9th
International Conference on Cybernetic Intelligent Systems, pp. 1-6
M. W. Schurgin, J. Nelson, S. Iida, H. Ohira, J. Y. Chiao, S.L. Franconeri, 2014,
Eye movements during emotion recognition in faces, Journal of Vision, Vol. 14, pp.
1-16
A. Halder, A. Konar, R. Mandal, A. Chakraborty, P. Bhowmik, N. R. Pal, A.K. Nagar,
2013, General and interval type-2 fuzzy face-space approach to emotion recognition.,
IEEE Transactions on Systems, Vol. 43, pp. 587-605
I. A. Adeyanju, E. O. Omidiora, O. F. Oyedokun, 2015, Performance evaluation of different
support vector machine kernels for face emotion recognition, SAI Intelligent Systems
Conference, pp. 804-806
I. M. Anderson, C. Shippen, G. Juhasz, D. Chase, E. Thomas, D. Downey, Z. G. Toth,
K. LIoyd-Williams, R. Elliott, J. F. W. Deakin, 2011, State-dependent alteration in
face emotion recognition in depression, The British Journal of Psychiatry, Vol. 198,
pp. 302-308
B. Schuller, G. Rigoll, M. Lang, 2003, Hidden markov model-based speech emotion recognition.,
2003 IEEE International Conference on Acoustics, pp. 1-4
B. Schuller, G. Rigoll, M. Lang, 2004, Speech emotion recognition combining acoustic
features and linguistic information in a hybrid support vector machine-belief network
architecture, ICASSP, pp. 557-580
K. Han, D. Yu, I. Tashev, 2014, Speech emotion recognition using deep neural network
and extreme learning machine, INTERSPEECH, pp. 223-227
K. Wang, N. An, B. N. Li, Y. Zhang, 2015, Speech emotion recognition using fourier
parameters., IEEE Transactions on Affective Computing, Vol. 6, pp. 69-75
M. Jain, S. Narayan, P. Balaji, B. K. P, A. Bhowmick, K. R, 2002, Speech emotion recognition
using support vector machine, Electrical Engineering and Systems Science
A. Pal, Y. N. Singh, 2018, ECG biometric recognition, ICMC2018 Mathematics and Computing,
pp. 61-73
A. Pal, A. K. Gautam, N. S. Yogendra, 2015, Evaluation of bioelectric signals for
human recognition, Procedia Computer Science, pp. 746-752
S. Kun, G. Yang, B. Wu, L. Yang, D. Li, P. Su, Y. Yin, 2019, Human identification
using finger vein and ECG signals., Neurocomputing, pp. 111-118
D. Rezqui, Z. Lachiri, 2016, ECG biometric recognition using SVM‐based approach, Transactions
on Electrical and Electronic Engineering, pp. 94-100
S. Barra, A. Casanova, M. Frashini, M. Nappi, 2015, EEG/ECG signal fusion aimed at
biometric recognition, ICIAP 2015: New Trends in Image Analysis and Processing, pp.
35-42
X. Dong, W. Si, W. Huang, 2018, ECG-based identity recognition via deterministic learning,
Biotechnology & Biotechnological Equipment, pp. 769-777
S. Aziz, M. U. Khan, Z. A. Choudhry, A. Aymin, A. Usman, 2019, ECG-based biometric
authentication using empirical mode decomposition and support vector machines, 2019
IEEE 10th Annual Information Technology, pp. 906-912
J. R. Pinto, J. S. Cardoso, A. Lourenco, 2018, Evolution, Current Challenges, and
future possibilities in ECG biometrics, IEEE ACCESS, pp. 34746-34776
J. R. Pinto, J. S. Cardoso, A. Lourenco, C. Carreiras, 2017, Towards a continuous
biometric system based on ECG signals acquired on the steering wheel, Sensors, pp.
1-14
E. Maiorana, P. Campisi, 2017, Longitudinal evaluation of EEG-based biometric recognition,
IEEE Transactions on Information Forensics and Security, pp. 1123-1138
E. Maiorana, J. Sole-Casals, P. Campisi, 2016, EEG signal preprocessing for biometric
recognition, Machine Vision and Applications, pp. 1351-1360
E. Maiorana, D. L. Rocca, P. Campisi, 2015, Cognitive biometric cryptosystems a case
study on EEG, 2015 International Conference on Systems, pp. 125-128
D. Nikolova, P. Mihaylova, A. Manolova, P. Georgieva, 2019, ECG-based human emotion
recognition across multiple subjects, FABULOUS 2019, pp. 25-36
H. Ferdinando, T. Seppanen, E. Alasaarela, 2016, Comparing features from ECG pattern
and HRV analysis for emotion recognition system, CIBCB 2016, pp. 1-6
A. Goshvarpour, A. Abbasi, A. Goshvarpour, 2017, An accurate emotion recognition system
using ECG and GSR signals and matching pursuit method, Biomedical Journal, Vol. 40,
pp. 355-368
T. Dissanayake, Y. Rajapaksha, R. Ragel, I. Nawinne, 2019, An ensemble learning approach
for electrocardiogram sensor based human emotion recognition, Sensors, Vol. 19, pp.
1-24
P. Sarkar, A. Etemad, 2020, Self-supervised learning for ECG-based emotion recognition,
45th IEEE International Conference on Acoustics Speech and Signal Processing, pp.
1-13
X. Ya, L. Guang-Yuan, 2009, A method of emotion recognition based on ECG signal, 2019
International Conference on Computational Intelligence and Natural Computing, pp.
202-205
Y. L. Hsu, J. S. Wang, W. C. Chiang, C. H. Hung, 2020, Automatic ECG-based emotion
recognition in music listening, IEEE Transactions on Affective Computing, Vol. 11,
pp. 85-99
F. Agrafioti, D. Hatzinakos, A. K. Anderson, 2012, ECG pattern analysis for emotion
detection, IEEE Transactions on Affective Computing, Vol. 3, pp. 102-115
S. Tivatansakul, M. Ohkura, 2016, Emotion recognition using ECG signals with local
pattern description methods, International Journal of Affective Engineering, Vol.
15, pp. 51-61
J. S, M. Murugappan, K. Wan, S. Yaacob, 2013, Emotion detection from QRS complex of
ECG signals using hurst exponent for different age groups, 2013 Humaine Association
Conference on Affective Computing and Intelligent Interaction, pp. 849-854
Y. Fan, X. Lu, D. Li, Y. Liu, 2016, Video-based emotion recognition using CNN-RNN
and C3D hybrid networks, ICMI’16, pp. 445-450
H. W. Ng, V. D. Nguyen, V. Vonikakis, S. Winkler, 2015, Deep learning for emotion
recognition on small datasets using transfer learning, ICMI’15, pp. 443-449
H. M. Fayek, M. Lech, L. Cavedon, 2017, Evaluating deep learning architectures for
speech emotion recognition, Neural Networks, Vol. 92, pp. 60-68
D. Issa, M. F. Demirci, A. Yazici, 2020, Speech emotion recognition with deep convolutional
neural networks, Biomedical Signal Processing and Control, Vol. 59, pp. 1-11
S. Jirayucharoensak, S. Pan-Ngum, P. Israsena, 2014, EEG- based emotion recognition
using deep learning network with principal component based covariate shift adaptation,
The Scientific World Journal, pp. 1-10
P. Pandey, K. R. Seeja, 2019, Subject independent emotion recognition from EEG using
VMD and deep learning, Journal of King Saud University-Computer and Information Sciences,
Vol. 14, pp. 1-9
J. Wang, R. Li, R. Li, B. Fu, 2020, A knowledge-based deep learning method for ECG
signal delineation, Future Generation Computer Systems, Vol. 109, pp. 56-66
G. Giannakakis, E. Trivizakis, M. Tsiknakis, K. Marias, 2019, A novel multi-kernel
1D convolutional neural network for stress recognition from ECG, 2019 8th International
Conference on ACIIW, pp. 273-276
R. Subramanian, J. Wache, M. K. Abadi, R. L. Vieriu, S. Winkler, N. Sebe, 2018, ASCERTAIN:
Emotion and personality recognition using commercial sensors, IEEE Transactions on
Affective Computing, Vol. 9, pp. 147-160
S. Koelstra, C. Muhl, M. Soleymani, J.S. Lee, A. Yazdni, T. Ebrahimi, T. Pun, A. Nijholt,
I. DEAP Patras, 2011, A database for emotion analysis using physiological signals,
IEEE Transactions on Affective Computing, Vol. 3, pp. 18-31
M. K. Abadi, R. Subramanian, S. M. Kia, P. Avesani, I. Patras, N. Sebe, 2015, DECAF
MEG-based multimodal database for decoding affective physiological responses, IEEE
Transactions on Affective Computing, Vol. 6, pp. 209-222
S. Katsigiannis, N. Ramzan, 2018, DREAMER: A databases for emotion recognition through
EEG and ECG signals from wireless low-cost off-the-shelf devices, IEEE Journal of
Biomedical and Health Informatics, Vol. 22, pp. 98-107
A. Radford, L. Metz, S. Chintala, 2016, Unsupervised representation learning with
deep convolutional generative adversarial networks, ICLR 2016, pp. 1-16
H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, D. N. Metaxas, 2019, StackGAN++:
Realistic image synthesis with stacked generative adversarial networks, IEEE Transactions
on Pattern Analysis and Machine Intelligence, Vol. 41, pp. 1947-1962
A. C. H. Rowe, P. C. Abbott, 1995, Daubechies wavelets and mathematica, Computers
in Physics, Vol. 635
D. K. Ruch, P. J. Van Fleet, 2009, Wavelet theory: An elementary approach with applications,
Wiley
V. V. Vermehren, H. M. Oliveira, 2015, Close expression for meyer wavelet and scale
function
A. Bernardino, J. Santos-Victor, 2005, A real-time gabor primal sketch for visual
attention, IBPRIA, pp. 335-342
T. Mallick, P. Balaprakash, E. Rask, J. Macfarlane, 2020, Transfer learning with graph
neural networks for short-term highway traffic forecasting, arXiv:2004.08038
T. J. Jun, H. M. Nguyen, D. Kang, D. Kim, Y. H. Kim, 2018, ECG arrhythmia classification
using a 2-D convolutional neural network, Computer Vision and Pattern Recognition,
pp. 1-22
저자소개
He received his PhD from Chosun University in Gwangju in 2017.
He worked as a postdoctoral researcher at Chosun University IT Research Center from
2017 to 2021.
He is currently conducting research at K-THEBOM's research institute. His interests
are particle computing, emotion recognition, language processing, and disease classification.
He received his Bachelor's degree from Chosun University in Gwangju in 2013.
He received his master's degree from Chosun University, Gwangju in 2014.
He received his PhD from Chosun University, Gwangju in 2021.
He is currently working as a postdoctoral researcher at the IT Research Center of
Chosun University.
His interests are pedestrian detection and deep learning.
He received his Bachelor's degree from Chosun University in Gwangju in 2016.
He received his master's degree from Chosun University in Gwangju in 2017.
He is currently pursuing his PhD at Chosun University in Gwangju.
His interests are particle computing and computational intelligence.
He received Ph.D. degree from Chungbuk National University in 2012.
He worked as a Postdoctoral Fellow at the University of Alberta, Canada from 2003
to 2005.
From 2005 to 2007, he worked as a senior researcher at Intelligent Robot Research
Division in ETRI.
He is now a professor at the Department of Electronics Engineering at Chosun University.
His interest areas are computational intelligence, human-robot interaction, and biometrics.