Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 13, No. 01, p.61-68

ISSN (online) :

2287-5255

Received : 21 June 2023Revised : 31 August 2023Accepted : 5 September 2023

DOI :

https://doi.org/10.5573/IEIESPC.2024.13.1.61

Regular Paper

Neuro-facial Fusion for Emotion AI: Improved Federated Learning GAN for Collaborative Multimodal Emotion Recognition

SaisanthiyaD.^1,^* SuprajaP.¹

(Department of Networking and Communications, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu 603203, India {saisantd, suprajap}@srmist.edu.in)

^*Corresponding Author: D. Saisanthiya

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

In the context of artificial intelligence technology, an emotion recognition (ER) has numerous roles in human lives. On the other hand, the emotion recognition techniques most currently used perform poorly in recognizing emotions, which limits their wide spread use in practical applications. A Collaborative Multimodal Emotion Recognition through Improved Federated Learning Generative Adversarial Network (MER-IFLGAN) for facial expressions and electro encephalogram (EEG) signals was proposed to reduce this issue. Multi-resolution binarized image feature extraction (MBIFE) was initially used for facial expression feature extraction. The EEG features were extracted using the Dwarf Mongoose Optimization (DMO) algorithm. Finally, IFLGAN completes the Emotion recognition task. The proposed technique was simulated in MATLAB. The proposed technique achieved 25.45% and 19.71% higher accuracy and a 32.01% and 39.11% shorter average processing time compared to the existing models, like EEG based Cross-subject and Cross-modal Model (CSCM) for Multimodal Emotion Recognition (MER-CSCM) and Long-Short Term Memory Model (LSTM) for EEG Emotion Recognition (MER-LSTM), respectively. The experimental results of the proposed model shows that complementing EEG signals with the features of facial expression could identify four types of emotions: happy, sad, fear, and neutral. Further more, the IFLGAN classifier can enhance the capacity of multimodal emotion recognition.

Keywords

Emotion recognition, Facial expressions, Electro encephalogram, Collaborative multimodal emotion recognition, Multi-resolution binarized image feature extraction, Dwarf mongoose optimization algorithm, Improved federated learning generative adversarial network

1. Introduction

Emotion is a process that involves expressions, and it operates in both conscious and unconscious circumstances in humans ^[1,^2]. People communicate with eachother using expressions. Some emotions include sadness, joy, rage, and fear. In human-computer interactions, more research has been done on emotion recognition ^[3]. The system has emotional connections because the human–computer interaction system is dynamic and complicated. ER is categorized in the brain using electro encephalogram signals ^[4-^7].Therefore, electro encephalogram signals are a vital part ofresearch ^[8-^10]. The human–computer interaction and multimedia technologies of today are highly advanced, making emotion recognition automatic ^[15]. Emotions are adjusted using emotion recognition at the 3^rdparty suggestion. Emotional AI is usedin many places, such as health care, entertainment, and education ^[16]. Artificial intelligence research in the robotic field has been raised. Many multinational companies, like Microsoft, Google, Samsung, have investing trillions of dollars in emotion recognition systems ^[17,^18]. Never the less, more time and domain knowledge are needed to implement this technique. Thus, ER identifies the user’s emotions and responds with the help of multimedia content ^[19,^20].

The problem statement of collaborative multi modal emotion recognition is to develop a system that accurately recognizes emotions from multiple modalities, such as facial expressions, EEG signals. This is a challenging problem because each modality has limitations ^[11-^20]. For example, facial expressions can be subtle and difficult to interpret, and EEG signals can be noisy, making itdifficult to extract features.The motivation forusing collaborative multimodal emotion recognition through improved federated learning generative adversarial network (GAN) for facial expressions and EEG signals is to address the limitations of each modality by combining the strengths of multiple modalities. GANs are a type of machine learning algorithm that can be used to generate realistic data. In this case, the GAN can generate synthetic facial expressions or EEG signals similar to real data. This can be used to improve the performance of the emotion recognition system by providing more training data and helping to regularize the training process. Federated learning is a paradigm that allows multiple devices to train a machine-learning model collaboratively without sharing their data. The combination of collaborative multimodal emotion recognition, GANs, and federated learning has the potential to develop more accurate emotion recognition systems.The model upgrades ER accuracy by combining the strengths of multiple modalities.

Remaining manuscript is organized as follows. The recent research is revealed in part 2. Part 3 outlines the proposed technique, and part 4 reports the outcome and discussions. Part 5 concludes this manuscript.

2. Literature Survey

Severalstudieshave usedelectroencephalogram-based ER. Among them, some studies arerevised here.

Zhang et al. ^[11] presented a Multiple modal ER and EEG-based Cross-subject with Cross-modal Model (CSCM). The input data wastaken from Standard Energy Efficiency Data (SEED) and SEED-IV. The presented model achieved abetter area under the curve value and lower f-measure.

Zhao et al. ^[12] presented an attention-based hybrid deep learning method for EEG emotion recognition. The model extracted the critical feature details and providedan efficient categorization. EEG data differential entropy characteristics are extracted and arranged by electrode location. The typical encoder encodes EEG input and extracts spatial information before introducing the band attention method to apply adaptable weights to distinct bands. The LSTM network extracted the temporal features, and the time attention technique attained critical temporal data. The input data was taken from two datasets, Dataset for Emotion Analysis using Physiological Signals (DEAP) and SEED. The presented model achieved better accuracy and lower average running time.

Wang et al. ^[13] presented the Multi-modal emotion recognition using EEG and speech signal. The Multimodal Emotion Database (MED4) consisted of four modalities. MED4 is comprised of simultaneously collected information from individual EEGs, photo plethysmography, speech, and face pictures after being affected by video stimulus meant to elicit joyful, sad, angry, or neutral moods. The presented model achieved a lower accuracy.

Wang et al. ^[14] presented the Multi-Modal Domain Adaptation Variation Autoencoder for EEG-basis ER. The multimodal domain adaptive variation autoencoder (MMDA-VAE) approach learns shared cross-domain latent representations of multimodal data. A Multi-modal Variation Auto Encoder (MVAE) was used to project information from multiple modes into a single space. The input data was taken from two data sets, SEED and SEED-IV. The presented model achieves abetter f-measure and lower area under the curve.

3. Proposed Methodology

Emotion identification depends on EEG signals because they can depict variations in human brain stages. In addition to EEG signals, a face expression signal is used as an external physiological characterization signal for ER ^[21,^22]. EEG with facial expression signals categorize emotions and are combined to the internal neural models and external sub-consciousness actions using an improved federated learning generative adversarial network to make multi-mode emotion recognition. The stability of emotional expression ability for facial expression and EEG signals throughout time ^[23,^24]. The ER accuracy is enhanced after the fusion of EEG signals and facial expressions. Fig. 1 portrays a flowchart of the MER-IFLGAN method.

Fig. 1. Flowchart of the MER-IFLGAN method.

3.1 Data Acquisition

In this study, a video library (VL) is initially made for video emotion-evoked electro encephalogram experimentations. Ninety video clips found in VL. These video clips have been combined into the WMV format after being compiled from various films and TV shows. These 90 video clips span three emotional spectra: pornographic, neutral, and violent. The pornographic and violent movie clips come from two kinds of films: action and drama. VL has 90 videoclips, with 30 clips each in pornographic, neutral, and violent videos. Six examiners (three men and three women) assessed only one emotion type before involved in the VL. While choosing videoclips, these 6 examiners could selectavideo clip from the video library if they felt that a particular emotional type of clip was appropriate. The video emotion-evoked EEG experimentation involved 13 healthy participants, seven men and six women. The subjects are 24–28 years, and the corrected vision achieved 1.0.

The 90 video clips were considered as the stimulus to elicit EEG signals with various video emotions. While watching the video clips that were playing constantly on the computer, the participants wore a 64-lead Quik-Cap electrode cap to produce electro encephalogram signals that represented various video emotions. A 10–20 system electrode coordination is used to arrange the electrodes of electrode cap. The EEG signals generated by experiments are collected and prepared using the Neuroscan system. E-Prime software invented through PST Company was used to design video emotion-evoked EEG experimentation. Initially, the computer screens in the presence of the subjects exhibited the instructions with caution. The subjects began the experimentation by press the space bar after carefully considering the experimental design and overall experiment content. There were 90 videos in the library, 10 for each type of emotion, and 30 clips were selected randomly for each topic.These 30 video clips playback randomly toavert subjects from making inertial memories. The cross-shape prompt was shown onthe computer screen before each video clip was played to gain the subjects' attention ^[25]. After playing each video clip, a rest period was allowed for subjects to remain silent. The experiment endedafter 30 chosen video clips were all played. The EEG signals from the participants were recorded at 1000 Hz,a sampling rate through out the experiment. The test was repeatedfor every subject until the EEG signals of the final 13 subjects were obtained. Four emotions (happy, sad, fear, and neutral) were considered to examine how differently EEG and facial expression signals can identify various emotional stages.

3.2 EEG Signal Feature Extraction Utilizing Dwarf Mongoose Optimization Algorithm

The EEG signal is a higher dimension, a weak physiological signal that is non-stationary and non-linear. The benefits of the feature-selection process using DMO include the simplicity of understanding and lack of prerequisite knowledge. As a result, DMO can pick features from complex EEG signals that are both higher dimensional and objectively more accurate classifications. First, dominant and non-dominant features are extracted from input electro encephalogram signal feature vector. DMO replicates the compensating behavioral response of the dwarf mongoose. The efficacy of each solution wascalculated after the population was initiated in the alpha group. The alpha female was selected using Eq. (1),

(1)

$ \delta =\frac{fitness_{i}}{{\sum }_{i=1}^{n}fitness_{i}} $

where$fitness_{i}$represent dominant features of EEG Signal. The solutions updating mechanism is based on Eq. (2)

(2)

$ \mathrm{M }_{i+1}=\mathrm{M }_{i}+P\ast P_{eep} $

Let$P_{eep}$represents dominant femalevocalization that keeps the family on track; $P$represents a distributed random number. The sleeping mound is assessed by Eq. (3)

(3)

$ SM_{i}=\frac{fitness_{i+1}-fitness_{i}}{\max \left\{\left| fitness_{i+1},fitness_{i}\right| \right\}} $

The scale of the average sleeping was determined using Eq. (4)

(4)

$ \varphi =\frac{{\sum }_{i=1}^{n}SM_{i}}{n} $

Once the baby sitting exchange criteria were satisfied, the approach progressed to the scouting stage, where the subsequent food source or resting mound was considered. If the family forages far away in the scout group section, they will find abetter-sleeping mound ^[26]. The scout mongoose is expressed as Eq. (5),

(5)

$\begin{align} \mathrm{M }_{i+1}=\begin{cases} \mathrm{M }_{i}-Cf\ast P\ast rand\ast \left[\mathrm{M }_{i}-\vec{X}\right]; & if\varphi _{i+1}> \varphi _{i}\\ \mathrm{M }_{i}+Cf\ast P\ast rand\ast \left[\mathrm{M }_{i}-\vec{X}\right]; & else \end{cases} \end{align} $

where$rand$represents a random number in the range$\left(0,1\right)$;$Cf$ represents the collective-volatile movement control parameter; $\vec{X}$ represents the movement vector, which can be determined based on Eq. (6)

(6)

$ \vec{X}={\sum }_{i=1}^{n}\frac{M_{i}\ast SM_{i}}{M_{i}} $

This equation removes the non-dominant features as of feature vector of the input electro encephalogram signal.The dominant features were then combined again using the feature vectors of the input electro encephalogram signal to produce new feature vectors.

3.3 Facial Expression Feature Extraction using Multi-resolution Binarized Image-feature Extraction

The facial expression is an essential portion of expressing emotions. Human faces and facial expressions can communicate a wide range of emotions. Utilize MBIFE to identify these states. This stage involves delivering the chosen EEG signal features to an MBIFE for feature extraction. The central and linear symmetrical were also used. The histogram image is expressed as Eq. (7),

(7)

$ \mathrm{H }_{{f_{1}}}\left(S,R\right)=\left[{h}_{S,R}^{0},{h}_{S,R}^{1},{h}_{S,R}^{2},.....,{h}_{S,R}^{p-1}\right]^{\mathrm{T }} $

where$S$is the window size; $R$is the pixel code-word resolution;$P$ is pixel intensity. The stacked normalized elements are expressed as Eq. (8),

(8)

$ {h}_{S,R}^{p}=\frac{1}{p}{\sum }_{j=1}^{p}\delta _{p}\left(j\right) $

$\delta _{p}(j)$ is expressed as Eq. (9),

(9)

$\begin{align} \delta _{p}\left(j\right)=\begin{cases} 1\;\;\;\;\;\;\;\;\;\;\;\;if\,V_{p}=p\\ 0\;\;\;\;\;\;\;\;\;\;\;\;otherwise \end{cases} \end{align} $

The final image depiction was then built through concatenation of histograms acquired by the application of every filter of multi-resolution bank is expressed as Eq. (10),

(10)

$ \mathrm{H }_{m}=\left[\mathrm{H }_{{f_{1}}},\mathrm{H }_{{f_{2}}},........,\mathrm{H }_{{f_{n}}}\right]^{\mathrm{T }} $

where$\mathrm{H }_{{f_{1}}}$represents calculated and concatenated histograms of the responses acquired with the applied filter bank. The large produced histograms werecollected column-wise in a single representative matrix, and all the examined defect pictures of all classes were handled in the same manner, which is expressed as Eq. (11),

(11)

$\begin{align} \mathrm{H }=\begin{bmatrix} \mathrm{H }_{{f_{1}}1} & \cdots & \mathrm{H }_{{f_{1}}\mathrm{M }}\\ \vdots & \ddots & \vdots\\ \mathrm{H }_{{f_{n}}1} & \cdots & \mathrm{H }_{{f_{n}}\mathrm{M }} \end{bmatrix} \end{align} $

where$\mathrm{M }$ isthe processed defect image count ^[27]. The remaining data reduction and categorization processes are expressed as Eq. (11) as a starting point. The covariance matrix $H^{\mathrm{T}}$ was calculated and is expressed as Eq. (12)

(12)

$ C_{m}=\varphi .\varphi ^{\mathrm{T }} $

Using the class difference principle, the between-class scatter matrix is expressed as Eq. (13)

(13)

$ B_{S}={\sum }_{i=1}^{C}\left(\varphi _{{c_{i}}}-\varphi \right)\left(\varphi _{{c_{i}}}-\varphi \right)^{\mathrm{T }} $

Eq. (14) expresses the with in-class scatter matrix,

(14)

$ W_{S}={\sum }_{i=1}^{C}{\sum }_{\mathrm{K }\in c_{i}}^{Q_{i}}\left(y_{\mathrm{K }}-\varphi _{{c_{i}}}\right)\left(y_{\mathrm{K }}-\varphi _{{c_{i}}}\right)^{\mathrm{T }} $

The ratio among the projections of $B_{S}$and $W_{S}$was calculated using the Fisher criterion approach expressed as Eq. (15),

(15)

$ W_{pm}=\frac{W^{\mathrm{T }}{W}_{pca}^{\mathrm{T }}B_{S}W_{pca}W}{W^{\mathrm{T }}{W}_{pca}^{\mathrm{T }}W_{S}W_{pca}W} $

Certain features were extracted from the EEG signals lacking transformation using Eq. (15). It incursless computational cost and is simpleto perform. From this, many effectual features were extracted utilizing Multi-resolution binarized image feature extraction (MBIFE). Subsequently, the extracted features werefed to the emotion recognition.

3.4 Multimodal Emotion Recognition based Upon Improved Federated Learning Generative Adversarial Network

The classification procedure was essential for recognizing emotions from facial expressions and EEG signals. The best classification algorithm should be chosen to develop a clear and precise mode to forecast emotions in realtime. This determines the efficacy and precision of multi modal emotion recognition. The Flexible activation Functions with Improved Federated Learning Generative Adversarial Network (IFLGAN) is proposed. The IFLGAN classifier was used to identify emotions, such as sadness, fear, happiness, and neutrality. IFLGAN can tailor to the individual characteristics of emotions and deal with the complications with facial expression by building various trees levels over the enormous training dataset utilizing 2 scans, with three times better performance than existing methods. Little run-time resources are used by IFLGAN, and no storage space is needed to save temporary data. The emotions are divided into four categories at the categorization unit: fear, sadness, happiness, and neutral. The Generator $G$ and Discriminator $D$ is expressed in Eq. (16),

(16)

$ \begin{array}{l} \min _{G}\max _{D}\,\,\,\,V\left(G.D\right)=\Phi _{1}\left(\mathrm{E }_{X\approx {\Pr _{1}}\left(X\right)}\log D_{1}\left(X\right)+\mathrm{E }_{Z\approx {P_{Z}}\left(Z\right)}\left[\log \left(1-D_{1}\left(G_{1}(Z)\right)\right)\right]\right)\\ +\Phi _{2}\left(\mathrm{E }_{X\approx {\Pr _{2}}\left(X\right)}\log D_{2}\left(X\right)+\mathrm{E }_{Z\approx {P_{Z}}\left(Z\right)}\left[\log \left(1-D_{2}\left(G_{2}(Z)\right)\right)\right]\right)+.....\\ +\Phi _{\mathrm{K }}\left(\mathrm{E }_{X\approx {\Pr _{\mathrm{K }}}\left(X\right)}\log D_{1}\left(X\right)+\mathrm{E }_{Z\approx {P_{Z}}\left(Z\right)}\left[\log \left(1-D_{\mathrm{K }}\left(G_{\mathrm{K }}(Z)\right)\right)\right]\right) \end{array} $

where$X$represents the training data along mini-batch size, $G_{1}(Z)$ represents generated data with mini-batch size ^[28]. The $soft\max $operation was performed and expressed as Eq. (17),

(17)

$ soft\max {\sum }_{i=1}^{\mathrm{K }}\Phi _{i}=1 $

MMD score is found through computing the predictions, and it is expressed in Eq. (18):

(18)

$ \mathrm{MMD}_{\mathrm{i}}=\underset{\left|\left|f\right|\right|\leq 1}{\sup }\left|\left|\mathrm{E }\left(f\left(x\right)\right)-\mathrm{E }\left(f(G(Z))\right)\right|\right| $

where$G_{i}$ represents the generator and $D_{i}$ represents the generator. The $soft\max $operation has been performed and expressed as Eq. (19),

(19)

$ soft\max .\Phi _{i}=\frac{e^{{\mathrm{MMD}_{\mathrm{i}}}}}{{\sum }_{j=1}^{\mathrm{K }}e^{{\mathrm{MMD}_{\mathrm{j}}}}} $

where $\mathrm{MMD}_{\mathrm{i}}$ represents the MMD Score. For each generator, the optimal discriminator is expressed as Eq. (20),

(20)

$ {D}_{i}^{\ast }\left(x\right)=\frac{\Pr _{i}}{\Pr _{i}+PG_{i}} $

If $\Pr _{i}=PG_{i}$, then ${D}_{i}^{\ast }\left(x\right)=\frac{1}{2}$. The global minimum of the virtual training criterion is expressed as Eq. (21),

(21)

$ V\left(G_{i}\right)=-2\log 2+\mathrm{K }L\left(\left.\Pr _{i}\right| \frac{\Pr _{i}+PG_{i}}{2}\right)+\mathrm{K }L\left(\left.PG_{i}\right| \frac{\Pr _{i}+PG_{i}}{2}\right) $

The formula of the global generator is expressed as Eq. (22),

(22)

$ G_{g}\left(X;\vartheta _{{G_{g}}}\right)={\sum }_{1=1}^{n}\Phi G_{i}\left(X;\vartheta _{{G_{i}}}\right) $

where$\vartheta _{{G_{i}}}$isthe $i^{th}$generator’s parameters;$\vartheta _{{G_{g}}}$denotes the parameters of the global generator. The parameters of each generatorare replaced in Eq. (22) using Eq. (23),

(23)

$ G_{i}\left(X;\vartheta _{{G_{i}}}\right)=G_{g}\left(X;\vartheta _{{G_{g}}}\right) $

where$G_{{g_{a}}}$and $G_{{g_{MMD}}}$is expressed in Eq. (24),

(24)

$\begin{align} \begin{cases} G_{{g_{a}}}=\frac{1}{2}G_{1}\left(Z;\vartheta _{{G_{1}}}\right)+\frac{1}{2}G_{2}\left(Z;\vartheta _{{G_{2}}}\right)\\ G_{{g_{MMD}}}=\Phi _{1}\times G_{1}\left(Z;\vartheta _{{G_{1}}}\right)+\Phi _{2}\times G_{2}\left(Z;\vartheta _{{G_{2}}}\right) \end{cases} \end{align} $

Hence, the IFLGAN is categorized as sad, fear, happy, and neutral.

4. Result and Discussion

This segment defines the experimental out come of the MER-IFLGAN technique. The proposed approach was simulated in Math Works Inc, MATLAB{\textregistered} version 9.7.0.1190202 (R2019b). The metrics were examined. The obtained results of MER-IFLGAN were compared with existing MER-CSCM ^[11] and MER-LSTM ^[12] models.

4.1 Performance Metrics

The performance of the proposed method was evaluated.

4.1.1 F Measure

This is computed by Eq. (25),

(25)

$ F\,measure=\frac{h}{\left(h+\frac{1}{2}\left[i+j\right]\right)} $

Let$h$specifies true positive;$i$specifies the true negative;$j$specifies false positive.

4.1.2 Accuracy

This was determined using Eq. (26),

(26)

$ A=\frac{h+k}{h+i+j+k} $

where$k$implies false negative.

4.2 Performance Analysis

Tables 1-3 list the efficiency of the MER-IFLGAN technique. In these tables, the performance metrics were evaluated. The efficiency was compared to the existing MER-CSCM and MER-LSTM approaches.

Table 1 presents accuracy evaluation. The MER-IFLGAN achieves 31.21% and 34.06% greater accuracy for happy; 26.01% and 27.79% greater accuracy for sad; 45.34% and 22.78% greater accuracy for fear; 46.28% and 34.11% greater accuracy for Neutral compared to the MER-CSCM and MER-LSTM models, respectively.

Table 2 lists the results of the F-measure analysis. The MER-IFLGAN achieved the following compared to the existing MER-CSCM and MER-LSTM models, respectively:35.67% and 33.54% better F-measure for happy; 36.73% and 34.71% better F-measure for sad; 43.14% and 46.27% higher F-measure for fear; 45.26% and 45.87% higher F-measure for Neutral.

Table 3 presents the Average Running Time (ART) analysis. MER-IFLGAN achieved the following compared to the existing MER-CSCM and MER-LSTM models, respectively: 35.45% and 33.32% shorter ART for happy; 38.77% and 25.89% shorter ART for sad; 35.67% and 45.14% shorter ART for fear; 42.15%, and 43.26% shorter ART for Neutral.

Fig. 2 presents RoC analysis. The MER-IFLGAN achieved a 2.92% and 4.15% higher AUC value than the existing MER-CSCM and MER-LSTM methods, respectively.

Fig. 2. Analysis of RoC.

Table 1. Accuracy evaluation.

Methods	Accuracy (%)
Methods	Happy	Sad	Fear	Neutral
MER-CSCM	76.48	86.61	91.78	84.34
MER-LSTM	83.51	75.31	81.73	81.79
MER-IFLGAN (proposed)	98.55	98.11	98.67	97.77

Table 2. F1 score estimation.

Methods	F-measure (%)
Methods	Happy	Sad	Fear	Neutral
MER-CSCM	75.76	91.67	84.64	85.33
MER-LSTM	83.45	79.35	78.78	81.71
MER-IFLGAN (proposed)	98.89	98.91	99.78	98.90

Table 3. Average Running Time Analysis.

Methods	ART(s)
Methods	Happy	Sad	Fear	Neutral
MER-CSCM	5.55	6.66	7.56	7.45
MER-LSTM	8.9	6.66	7.45	9.7
MER-IFLGAN (proposed)	2.36	3.96	4.17	5.61

5. Conclusion

The Improved Federated Learning Generative Adversarial Network Espoused Multimodal Emotion Recognition in EEG and facial expression was implemented. The MER-IFLGAN technique was simulated in MATLAB. The MER-IFLGAN method achieved11.14% and 8.36% higher F-measure than the existing MER-CSCM and MER-LSTM models, respectively. Most studies used two distinct emotion signals as their target objects, but the recognized emotion rate tended to lessen when people delineated obfuscated emotion signals. Hence, the next study will concentrate on structuring a more efficient emotion database and consolidating more emotion details to enhance the development of the ER scheme. In addition, there is a lack of a global public data base that contains video and associated evoked EEG. A future public video EEG data base can be developed by examining ways to maximize the video type, count, and length and amassing EEG signals from numerous subjects.

REFERENCES

E.S. Salama, R.A. El-Khoribi, M.E. Shoman, M.A.W. Shalaby, ``A 3D-convolutional neural network framework with ensemble learning techniques for multi-modal emotion recognition.'' Egyptian Informatics Journal, vol. 22, no. 2, pp. 167-176. 2021.

Y. Wu, J. Li, ``Multi-modal emotion identification fusing facial expression and EEG.'' Multimedia Tools and Applications, vol. 82, no. 7, pp. 10901-10919. 2023.

L. Fang, S.P. Xing, Z. Ma, Z. Zhang, Y. Long, K.P. Lee, S.J. Wang, ``Emo-MG Framework: LSTM-based Multi-modal Emotion Detection through Electro encephalography Signals and Micro Gestures.'' International Journal of Human-Computer Interaction, vol. 1, no. 1, pp. 1-17. 2023.

F. H. Shajin, B. Aruna Devi, N. B. Prakash, G. R. Sreekanth, P. Rajesh, ``Sailfish optimizer with Levy flight, chaotic and opposition-based multi-level thres holding for medical image segmentation.'' Soft Computing, pp. 1-26. Apr. 2023.

F. H. Shajin, P. Rajesh, M. R. Raja, ``An efficient VLSI architecture for fast motion estimation exploiting zero motion pre judgment technique and a new quadrant-based search algorithm in HEVC.'' Circuits, Systems, and Signal Processing, pp. 1-24. Mar. 2022.

P. Rajesh, F. Shajin, ``A multi-objective hybrid algorithm for planning electrical distribution system.'' European Journal of Electrical Engineering, vol. 22, no. 4-5, pp. 224-509. Jun. 2020.

P. Rajesh, R. Kannan, J. Vishnupriyan, B. Rajani, ``Optimally detecting and classifying the transmission line fault in power system using hybrid technique.'' ISA transactions, vol. 130, pp. 253-264. Nov. 2022.

F.M. Alamgir, M.S. Alam, ``Hybrid multi-modal emotion recognition framework based on Inception V3 DenseNet.'' Multimedia Tools and Applications, vol. 1, no. 1, pp. 1-28. 2023.

S. Dutta, B.K.. Mishra, A. Mitra, A. Chakraborty, ``A Multi-modal Approach for Emotion Recognition Through the Quadrants of Valence-Arousal Plane.'' SN Computer Science, vol. 4, no. 5, pp. 460. 2023.

S. Liu, P. Gao, Y. Li, W. Fu, W. ``Ding, Multi-modal fusion network with complementarity and importance for emotion recognition.'' Information Sciences, vol. 619, no. 1, pp. 679-694. 2023.

J.M. Zhang, X. Yan, Z.Y. Li, L.M. Zhao, Y.Z. Liu, H.L. Li, B.L. Lu. ``A Cross-subject and Cross-modal Model for Multimodal Emotion Recognition''. InNeural Information Processing: 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8-12, Proceedings, Part VI 28 2021 (pp. 203-211). Springer International Publishing. 2021.

Z. Zhao, Z. Gong, M. Niu, J. Ma, H. Wang, Z. Zhang, Y. Li. ``Automatic respiratory sound classification via multi-branch temporal convolutional network'' InICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 9102-9106). IEEE. 2022.

Q. Wang, M. Wang, Y. Yang, X .Zhang. ``Multi-modal emotion recognition using EEG and speech signals.'' Computers in Biology and Medicine. vol. 149: 105907. 2022.

Y. Wang, S. Qiu, D. Li, C. Du, B.L. Lu, H. He. ``Multi-modal domain adaptation variational auto encoder for eeg-based emotion recognition.'' IEEE/CAA Journal of Automatica Sinica. vol. 9(9): 161226. 2022.

M. Maithri, U. Raghavendra, A. Gudigar, J. Samanth, P.D. Barua, M. Murugappan, Y. Chakole, U.R. Acharya, ``Automated emotion recognition: Current trends and future perspectives.'' Computer methods and programs in biomedicine, vol. 215, no. 1, p. 106646. 2022.

C. Guanghui, Z. Xiaoping, ``Multi-modal emotion recognition by fusing correlation features of speech-visual.'' IEEE Signal Processing Letters, vol. 28, pp. 533-537. 2021.

Y. Hu, F. Wang, ``Multi-Modal Emotion Recognition Combining Face Image and EEG Signal.'' Journal of Circuits, Systems and Computers, vol. 32, no. 07, p. 2350125. 2023.

M. Wang, Z. Huang, Y. Li, L. Dong, H. Pan, ``Maximum weight multi-modal information fusion algorithm of electro encephalographs and face images for emotion recognition.'' Computers & Electrical Engineering, vol. 94, no. 1, p. 107319. 2021.

Y. Zhang, C. Cheng, Y. Zhang, ``Multimodal emotion recognition using a hierarchical fusion convolutional neural network.'' IEEE access, vol. 9, no. 1, pp. 7943-7951. 2021.

D. Liu, L. Chen, Z. Wang, G. Diao, ``Speech expression multimodal emotion recognition based on deep belief network.'' Journal of Grid Computing, vol. 19, no. 2, p. 22. 2021.

H. Zhang, ``Expression-EEG based collaborative multimodal emotion recognition using deep autoencoder.'' IEEE Access, vol. 8, no. 1 pp. 164130-164143, 2020.

F. Aldosari, L. Abualigah, and K.H. Almotairi, ``A normal distributed dwarf mongoose optimization algorithm for global optimization and data clustering applications.'' Symmetry, vol. 14, no. 5, pp. 1021. 2022.

D. Saisanthiya, P. Supraja. "Heterogeneous Convolutional Neural Networks for Emotion Recognition Combined with Multimodal Factorised Bilinear Pooling and Mobile Application Recommendation", International Journal of Interactive Mobile Technologies (iJIM), 2023.

M. Park, and S. Chai, BTIMFL: A Blockchain-Based Trust Incentive Mechanism in Federated Learning. In International Conference on Computational Science and Its Applications (pp. 175-185). Cham: Springer Nature Switzerland, vol. 1, no. 1, pp. 1 June. 2023.

H. Zhang, ``Expression-EEG based collaborative multimodal emotion recognition using deep auto encoder.'' IEEE Access, vol. 8, no. 1, pp. 164130-164143. 2020.

J.O. Agushaka, A.E. Ezugwu, L. Abualigah, ``Dwarf mongoose optimization algorithm.'' Computer methods in applied mechanics and engineering, vol. 391, no. 1, p. 114570. 2022.

L.K.. Pavithra, T. Sree Sharmila, P. Subbulakshmi, ``Texture image classification and retrieval using multi-resolution radial gradient binary pattern. ``Applied Artificial Intelligence, vol. 35, no. 15, pp. 2298-2326. 2021.

W. Li, J. Chen, Z. Wang, Z. Shen, C. Ma, X. Cui, ``Ifl-gan: Improved federated learning generative adversarial network with maximum mean discrepancy model aggregation.'' IEEE Transactions on Neural Networks and Learning Systems. vol. 1, no. 1, pp. 1-12022.

D. Saisanthiya

D. Saisanthiya received B.Tech degree in CSE from Arulmigu Meenakshi Amman College of Engi-neering, Thiruvannamalai affiliated to Anna University ,Tamil nadu at 2009. M.Tech Degree in CSE from Sastha Institute of Sience and Technology, Chembarambakkam affiliated to Anna University , Tamil Nadu at 2011. She is currently working towards the Ph.D. degree at the School of Computing, Faculty of Engineering and Technology, SRM Institute of Science and Technology, India. His research interests include deep learning and Machine learning algorithms.

P. Supraja

P. Supraja Currently working as an Associate professor, School of Computing, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Chennai, Tamil Nadu, India She was a recipient of AICTE Visvesvaraya Best Teacher Award 2020. Previously She completed the Indo-US WISTEMM Research fellow ship at University of Southern California, Los Angeles, USA funded by IUSSTF and DST Govt of India and She served as a Post-Doctoral Research Associate at Northumbria University, Newcastle, UK and completed her PhD from Anna University in 2017. She has published more than 50 research papers in reputed national and international level journals/conferences. She received her university-level Best Research Paper Award in 2022 & 2019 also she has received funding from AICTE for conducting STTP. Her research interests include Cognitive Computing, Optimization algorithms, Machine learning, Deep Learning, Wireless Communication, and IoT. She is a reviewer in IEEE, Inderscience, Elsevier and Springer Journals. She is also a member of several national and international professional bodies including IEEE, ACM, ISTE, etc. In addition, she has received the young women in Engineering award and Distinguished Young Researcher award from various International Organizations.

IEIE SPC IEIE Transactions on Smart Processing & Computing

Journal Search

Journal XML

Journal Information

Neuro-facial Fusion for Emotion AI: Improved Federated Learning GAN for Collaborative Multimodal Emotion Recognition

Abstract

Keywords

1. Introduction

2. Literature Survey

3. Proposed Methodology

Fig. 1. Flowchart of the MER-IFLGAN method.

3.1 Data Acquisition

3.2 EEG Signal Feature Extraction Utilizing Dwarf Mongoose Optimization Algorithm

(1)

(2)

(3)

(4)

(5)

(6)

3.3 Facial Expression Feature Extraction using Multi-resolution Binarized Image-feature Extraction

(7)

(8)

(9)

(10)

(11)

(12)

(13)

(14)

(15)

3.4 Multimodal Emotion Recognition based Upon Improved Federated Learning Generative Adversarial Network

(16)

(17)

(18)

(19)

(20)

(21)

(22)

(23)

(24)

4. Result and Discussion

4.1 Performance Metrics

4.1.1 F Measure

(25)

4.1.2 Accuracy

(26)

4.2 Performance Analysis

Fig. 2. Analysis of RoC.

Table 1. Accuracy evaluation.

Table 2. F1 score estimation.

Table 3. Average Running Time Analysis.

5. Conclusion

REFERENCES

D. Saisanthiya

P. Supraja

Article Information (continued)

Keywords

IEIE SPC

IEIE Transactions on Smart Processing & Computing