Expression Recognition Algorithm Based on Fusion Features for Students’ Emotional
Analysis on Art Education Platform
CuiZiwei1
-
(College of Art and Design, Henan Vocational University of Science and Technology,
Zhoukou, 466000, China
Ziwei_Cui2023@outlook.com)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Art education platform, Expression recognition algorithm, Improved LBP algorithm, Regional feature weighted fusion, HOG algorithm
1. Introduction
The expression of the human face can reflect the information of human's concentration,
psychological activities and other information to a certain extent, so expression
recognition has gradually become a research hotspot of human-computer interaction
and image recognition [1-2]. With the development of the Internet, online education technology is also developing,
and educational platforms for various subjects are becoming more and more common [3]. An important research content of online education is to judge students' attention
status and knowledge acceptance, and the continuous improvement of psychology and
other related disciplines makes this possible through facial expression recognition
algorithms [4]. The computer can extract relevant facial feature information and use a human-like
thinking structure to understand and analyze emotions such as sadness, surprise, anger,
and happiness displayed through facial expressions. It can accurately classify facial
expressions [5]. The paper utilizes the improved Local Binary Pattern (LBP) algorithm and Histogram
of Oriented Gradient (HOG) algorithm to extract feature information from various regions.
It then performs feature fusion and explores the fusion information of multiple features
to enhance the possibility of recognizing human facial expressions. Simultaneously,
by integrating multiple features, the accuracy and robustness of facial expression
recognition can be improved, thereby better understanding and meeting the emotional
needs of students on art education platforms. Teachers can obtain targeted emotional
analysis results for students, helping them adjust their emotional states and improve
learning outcomes. The aim of this paper is to enhance the quality of the teaching
and learning experience by utilizing improved algorithms to meet the precise needs
of art education platforms for analyzing student emotions.
2. Related Work
The field of online education is experiencing continuous development, with an increasing
number of new technologies being applied to address related issues. Andrejevic and
other scholars argued that while some may question the educational limitations of
face-driven learning, it is important to consider how to apply face recognition technology
to specific educational environments. Related face recognition and detection technologies
could be used to solve related supervision and safety issues in school education and
campus environment [6]. Lee et al conducted a perceptual survey on teaching and learning management, educational
content, and the performance of artificial intelligence education platforms based
on five criteria. The results showed that the artificial intelligence education platform
could provide more convenient accessibility and high-quality educational content,
and made teaching and learning management easier [7]. Ran et al. proposed an intelligent model for continuous user authentication to prevent
security issues in distance education platforms, such as private data leakage or tampering.
The model was designed with high security measures for both the platform infrastructure
and the user authentication process. This provided new insights into the security
of distance education [8]. Based on the customer behavior theory of the B2B2C platform, Hou and other researchers
proposed an online education model for art education combining two models. This study
expanded the scope of information technology as an art education platform from an
academic point of view, and provided effective theoretical support for an online art
education platform [9]. Lee et al. used deep learning algorithms in artificial intelligence to obtain quantitative
evaluation results in learning evaluation. The specific implementation method was
to use the convolutional neural network model to train facial expressions, and divide
the expression data into three categories: easy, medium, and difficult. The research
results provided relevant support for the intelligent education evaluation system
[10].
With the continuous advancement of human-computer interaction technology, facial expression
recognition algorithms are widely used in areas such as safe driving, remote driving,
game feedback, and psychological analysis. Georgescu MI et al. combined automatic
feature learning with manual feature calculation, and used a Support Vector Machine
(SVM) classifier with a one-to-one classification strategy on the training samples.
Classification was carried out, and a high accuracy rate of 87.76 was obtained on
the experimental data set [11]. Ye and other scholars proposed a region-based convolutional fusion network to ensure
the effectiveness and robustness of learning in different samples from three aspects:
reducing interference, integrating semantic levels and introducing constraint penalty
loss. By establishing a muscle motion model, segmentation and extraction of key frontal
and facial regions were carried out to reduce interference caused by differences in
the size and position of individual facial organs [12]. Sun and other scholars have proposed a multi-channel depth neural network that learns
and fuses spatio-temporal features for facial expression recognition in static images.
This method extracted the temporal information of the expression from the changing
optical flow between the peak and neutral expression images, and used the grayscale
image of the former as the spatial information. The average recognition accuracy of
this method on three data sets was about 99.05% [13]. Scholars such as Zhang proposed a video sequence face recognition method based on
a hybrid deep learning model, which uses two convolutional neural networks to learn
high-level spatial features of segmented static facial images and temporal features
of optical flow images [14].
Based on current research by domestic and foreign scholars on online education platforms
and expression recognition algorithms, it is evident that online education places
significant emphasis on students' emotional feedback during online learning. Expression
recognition algorithms, a research hotspot in emotional computing, are also utilized
in online art education. Additionally, student sentiment analysis is available.
3. Empowering Art Education Platforms for Enhanced Learning: A Fusion Approach with
Improved LBP and HOG Algorithms for Expression Recognition
3.1 Optimizing Facial Expression Recognition in Art Education: Enhanced LBP and HOG
Algorithms
At present, the related face recognition algorithm is relatively mature, and it also
provides a lot of help and references for expression recognition [15]. There are many facial expression feature extraction methods for recognition in front
of us, but the corresponding features extracted by various algorithms often have certain
differences, which will also affect the accuracy of recognition [16]. The paper improves the original LBP algorithm, and uses the HOG algorithm to optimize
the expression feature extraction. The LBP algorithm can effectively extract the texture
features describing the distribution relationship between a single pixel and its domain
pixels in the image. When the traditional LBP operator extracts image features, it
first defines a 3?3 texture unit, and the pixel gray value in the center of the cell
is used as the threshold value to compare the pixel gray values in the other 8 fields,
and the pixel is encoded as 1 or 0, as shown in (1).
In (1), $g_{c} $ represents the gray value of the center pixel, and $g_{e} $ represents
the gray value of the remaining 8 pixels. The pixels in 8 areas are combined into
a binary number in a specific direction, and the corresponding decimal number is the
LBP feature of the texture unit. Due to the limitation of its area coverage, the traditional
LBP operator cannot satisfy the texture feature extraction of different scales. To
address this issue, some scholars have modified the 3?3 fixed window by making it
adjustable in size and expanding the domain from a square to a circle with a radius
of R. The circular LBP operators at different scales are shown in Fig. 1.
Fig. 1. LBP operators at different scales.
A circular area with a radius of R and containing e pixels $LBP_{R}^{e} $ will generate
$2^{e} $ species codes. For example of $LBP_{1.5}^{12} $, there are 4096 kinds of
LBP codes. The more pixels there are, the more complex the feature extraction is.
To solve this problem, the code change of two adjacent pixels is counted as ``one
jump'', and the number of jumps is less than or equal to two is called ``unified mode''.
Statistical results show that the unified mode accounts for more than 90% of all coding
modes, that is, for $LBP_{1}^{8} $, there are three modes in the unified mode $e\left(e-1\right)$,
and the number of coding modes is reduced from 256 to 56. The original LBP algorithm
only considers the relationship between the central pixel and other pixels, ignoring
the role of the central pixel and the difference in gray value between pixels, which
easily causes the loss of information on local structural features. The paper introduces
a threshold T based on the LBP algorithm. The gray level difference between the central
pixel and each adjacent pixel is calculated, and then the absolute values of all the
differences is added to get the mean value, and the obtained mean value is the threshold
T. The difference between the grayscale values of the central pixel and the regional
pixel is taken as absolute values in a certain order, and the absolute values are
compared with the threshold T and encoded. The obtained binary code is converted into
a decimal number as shown in (2).
The decimal number $LBP^{*} {}_{R}^{e} $ in (2) represents the LBP feature of the center point. The original and improved LBP algorithm
codes are shown in Fig. 2.
Fig. 2. Schematic diagram of original and improved LBP algorithm coding.
In Fig. 2(a), according to the original LBP, the encoding of two window units with large differences
in central pixel points obtains the same LBP encoding, that is, they are classified
into the same mode during recognition. In Fig. 2(b), the improved LBP algorithm can be used for encoding to avoid classification misjudgment
caused by classifying bright spots and dark points into one category. Considering
that there are differences between different facial parts in the expression image,
directly extracting LBP features from the entire image and generating a histogram
may lose the local differences of the face, so the LBP histogram of the image is extracted
by using a block method, and then the single The LBP histograms of the blocks are
concatenated to form a composite histogram. To describe local features and facial
edge information more effectively, HOG is introduced to extract features from local
regions of facial images. HOG divides the expression image into single non-overlapping
cell units, and uses the HOG series of single-cell units to form the HOG features
of the entire image. When HOG extracts features, first to use the gamma correction
method to normalize the color space of the image to reduce the interference caused
by light, as shown in (3).
In (3), $S\left(x,y\right)$ represents the pixel value compressed by Gamma, and $\left(x,y\right)$
represents the pixel point. Then the pixel gradient is calculated to extract the local
contour information, and the calculation expression is shown in (4).
In (4), $G_{x} \left(x,y\right)$ is the calculated pixel gradient value in the horizontal
direction. $G_{y} \left(x,y\right)$ represents the pixel gradient value in the vertical
direction. The gradient magnitude of the pixel is calculated as shown in (5).
In (5), $G\left(x,y\right)$ represents the gradient magnitude of the pixel. The gradient
direction value of the pixel point is $d\left(x,y\right)$, as shown in (6).
Each cell unit contains 16 histogram channels, and the gradient direction is divided
into 16 direction blocks, that is to say, the histogram of a cell unit is a 16-dimensional
histogram. Several adjacent cell units are divided into a block, and the features
of all cell units in the block are concatenated to form a block feature. All blocks
are connected into a vector in a certain order, and this vector is the HOG feature
vector of the emoticon image. The division of a cell unit is shown in Fig. 3.
Fig. 3. Schematic diagram of division of A cell unitf.
3.2 Transformative Art Education: Regional Feature Weighted Fusion of Improved LBP
and HOG Algorithms for Expression Recognition
Due to the abundance of high-dimensional data obtained through feature extraction,
it is necessary to reduce the dimensionality of the original features before classification
to decrease the computational load. The study uses the classic Principal Component
Analysis (PCA) algorithm for dimensionality reduction. PCA mainly converts high-dimensional
feature vectors into orthogonal basis vectors through KL transformation, and analyzes
the statistical characteristics of a certain part of the training samples of the orthogonal
basis vectors to determine the required target low-dimensional space basis vectors
at one time [17-18]. The mathematical principle of KL transformation is shown in formula (7).
In (7), $A_{n} $ refers to a $n$ dimensional random variable denoted by weighted basis vectors
$\alpha _{j} $. $n$ denotes the weighted parameter values. $\phi _{j} $ denotes the
basis vectors. The matrix form of $A_{n} $ is shown in (8).
In (8), the sum $(\phi _{1} $, $\phi _{2} $, $\cdots$, $\phi _{n})$ takes the base vector
$(\alpha _{1} $, $\alpha _{2} $, $\cdots$, $\alpha _{n} )^{T} $ as the orthogonal
vector, and the specific expression is shown in the (9).
In (9), $\phi $ is an orthogonal matrix. According to the characteristics of the orthogonal
matrix, the expression of $A_{n} $ shown in (10) can be obtained by multiplying both sides of the matrix expression at the same time.
To ensure that the components in the vector $\alpha $ are not correlated with each
other, the auto-correlation matrix of the random vector is recorded as $A_{m} =E[A_{n}^{T}$~
$A_{n}]$. The expression shown in (11) can be obtained.
Finally, the expression shown in (12) can be obtained through calculation. $\lambda _{j} $ is the eigenvalue in the auto-correlation
matrix and $\phi _{k} $ is the corresponding eigenvector. According to the properties
of real symmetric matrices, if the obtained eigenvalues are different, the corresponding
eigenvectors are orthogonal vectors.
For $N$, a sample matrix containing samples $A=[A_{1} $, $A_{2} $, $A_{N} ]^{T} $,
each sample has $P$ features. Using PCA to reduce the sample from $P$ dimension to
$Q$ dimension ($Q<P$) first needs to calculate the covariance matrix of the sample.
The difference between the sample value and the sample mean is used as the deviation
between the observed value and the mean in each dimension. Then the $P\times P$ covariance
matrix $C^{P\times P} $ of the dimension is obtained. The calculation of the sample
mean is shown in (13).
By calculating the eigenvalues and eigenvectors of the covariance matrix and sorting
them in descending order according to the eigenvalues, the first $Q$ eigenvalues and
eigenvectors are screened out to form the eigenmatrix $avg\left(i\right)=\frac{1}{N}
\sum _{j=1}^{N}A_{j,i} $. The PCA dimensionality reduction matrix can be obtained
by multiplying the sample matrix and the feature matrix $B$. The dimensionality of
the dimensionality reduction matrix is $N\times Q$. If only a single feature describing
part of the feature information is used for classification and recognition, the recognition
accuracy is easily affected by environmental changes [19-20]. In this study, the texture information extracted by the improved LBP algorithm is
fused with the contour edge shape information extracted by the HOG algorithm, so as
to express the expression information of the image through fusion features and carry
out classification and recognition. Since the expression changes are mainly reflected
in the eyebrow area and the mouth area, the texture features extracted by the improved
LBP algorithm are used as feature 1. The edge shape information of the eyebrow area
and mouth area extracted by HOG are respectively used as feature 2 and feature 3.
For the three-feature information, the fusion feature information is obtained after
dimensionality reduction processing, normalization processing, and weighted fusion.
The flow chart of feature fusion in different regions of the face is shown in Fig. 4.
Fig. 4. Flow chart of feature fusion for different facial regions.
In Fig. 4, the study utilizes the eyebrow, eye and mouth regions for expression recognition,
which are the most effective parts when recognizing expressions. The study firstly
crops out the two regions of eyebrows-eyes and mouth as region 2 and region 3, respectively.
Secondly, the study utilizes the LBP algorithm to extract texture features from the
preprocessed pure expression image region as region 1. The LBP algorithm is able to
capture the local texture changes in the image. After obtaining the initial feature
information of the three regions, the study carries out the dimensionality reduction
process to simplify the data and reduce the amount of computation. The next is the
normalization of the feature information to ensure that each feature has the same
weight in the subsequent processing. Finally, weighted fusion is performed according
to (14) to integrate the feature information of the three regions to obtain more comprehensive
and accurate expression recognition results.
In (14), $F$ represents the eigenvalues obtained after weighted fusion. $T_{1} $ represents
the processed facial texture features. $E_{2} $ and $E_{3} $ represent the edge shape
information of the eyebrows, eyes and mouth regions, respectively. $\eta $, $\delta
$, and $\gamma $ are the weighting coefficients of the three regional features, respectively.
Through in-depth analysis of the expression dataset, the study found that there is
significant variability in the mouth region. Even in the same kind of expressions,
the mouth images of different individuals show significantly different morphologies
and variations. This variability prevents the feature information of the mouth region
from being the focus of recognition in expression recognition. Therefore, in the process
of weighted fusion, the study needs to set the weight coefficient $\gamma $ of the
mouth region smaller than $\eta $ and $\delta $ of the eyebrow-eye region. In this
way, the feature information of the mouth region can be used as auxiliary information,
combined with the features of the eyebrow and eye regions, to jointly provide a more
comprehensive and accurate basis for expression recognition and classification.
4. Expression Recognition Results of Fusion Features for Student Emotion Analysis
on Education Platform
Based on the JAFFE and CK+ expression datasets, the study set up the influence experiment
of the extraction area on different datasets, the weighting parameter selection experiment,
and the comparison experiment of feature extraction algorithms to analyze the accuracy
of expression recognition under different influencing factors. When the improved LBP
algorithm extracts facial texture features, it chooses to divide the face image into
$6\times5$ blocks, and obtains $6\times5\times256=7680$ dimensional features and performs
dimensionality reduction processing. When the HOG algorithm extracts edge features,
the cell unit of $4\times4$ pixels is selected for image division, and a cell unit
volume is divided into 9 bins, which is a 9-dimensional histogram. The original feature
of a block extracted is $2\times2\times9=36$ dimensions. The fusion features obtained
after correlation processing and weighted fusion are used as the identification features
of the classifier and are classified and identified. The JAFFE expression database
includes a total of 2107 expressions recorded by 10 testers, namely anger (AN), sadness
(SA), surprise (SU), happiness (HA), fear (FE), disgust (DI) and Neutral (NE). The
CK+ expression dataset consists of 593 expression video sequences by collecting the
expressions of 123 testers. There are 327 image sequences in the library to mark the
expressions of the images with numerical labels from 1 to 7. The numbers from small
to large represent AN, contempt (CO), DI, FE, HA, SA, SU. In the experiment on the
influence of different extraction regions on the recognition rate, the JAFFE dataset
is preprocessed first to obtain a $120\times120$ face image, 104?32 size for eye and
brow area, and 64?32 size for the mouth area. The study uses K-Nearest Neighbor (KNN)
classifier and SVM classification to conduct expression classification and recognition
experiments to determine a better classifier. The experiment adopts the cross-validation
method: one of the 10 groups of testers is selected as the test sample, and the rest
are used as the training samples. Finally, the average recognition rate of the 10
groups of expression recognition is taken as the result of the single expression recognition.
The recognition results of the two classifiers on the JAFFE dataset are shown in Table 1.
Table 1. Recognition results of KNN classifier and SVM classifier on JAFFE dataset.
Expression
|
Expression recognition rate(%)
|
KNN classifier
|
SVM classifier
|
AN
|
93.5
|
96.7
|
HA
|
93.3
|
95.9
|
SA
|
93.7
|
96.1
|
SU
|
94.6
|
97.6
|
DI
|
93.9
|
92.5
|
FE
|
92.4
|
94.2
|
NE
|
9 7.2 %
|
98.4 _
|
Average
|
94.09
|
95.91
|
In Table 1, the recognition rate of the SVM classifier for the seven expressions is basically
higher than that of the KNN classifier, and the average recognition rate of the SVM
classifier is also slightly higher than that of the KNN classifier. The recognition
rate of the KNN classifier is lower than 95% except for the neutral expression, while
the recognition rate of the SVM classifier is higher than 95% for the rest of the
expressions except the disgust expression. Therefore, the paper chooses SVM as the
classification of expression recognition algorithm. The experimental results of the
influence of different weighted regions are shown in Fig. 5.
Fig. 5. Recognition rate results of different recognition areas.
Three groups of experiments are carried out with different combinations of three extracted
regions to analyze the influence of weighted regions on recognition rate. The first
experiment only extracts the features of the face area. The second experiment extracts
the features of the face area and the mouth area. The third experiment extracts the
features of the face area, eyebrows and eye areas. The three groups of experiments
adopt the cross-validation method to ensure the validity. From Fig. 4, in the three groups of experiments, the expression recognition effect of the combination
of the face area and the eyebrow and eye area is the best, followed by the combination
of the face area and the mouth area, and the recognition effect of the face area alone
is the worst. The face area is fused with the eyebrow and eye area to recognize 7
kinds of expressions, and the recognition rates of surprised and neutral expressions
are relatively high, which are 94.7% and 94.2% respectively. The average recognition
rates of expressions in the three groups of experiments in descending order are 89.84%
(face+eyes), 87.79% (face+mouth) and 86.83% (face). The results of the recognition
rate show that the fusion of extracted features from different regions of the face
is effective and necessary for expression recognition. At the same time, it is verified
that the mouth region has large individual differences, and the region weighting parameter
is too large to increase the probability of misjudgment. Therefore, the weighting
parameter value of the mouth region should be smaller than the face and eyebrow eye
area. Based on the JAFFE data set, the region weighting parameter selection experiment
is carried out, and the adjustment of the weighting coefficients of each region is
shown in Fig. 6.
Fig. 6. Adjustment results of weighting coefficients of Different extraction regions.
The region weighting parameter selection experiment also adopts the cross-validation
method to ensure the validity and accuracy. The experiment finds the optimal parameter
combination by changing the weighting coefficients of the three areas. From Figs.
5(a) and 5(b), when adjusting the weighting parameters of the mouth region and eyebrow
and eye region, the recognition rate increases first and then decreases with the increase
of the weighting coefficient. The maximum recognition rate (96.7%, 96.7%) is achieved
when the weight parameter of the mouth area is equal to 0.2 and the parameter of the
eyebrow area is equal to 0.4. Observing the weight parameter adjustment results of
the face area in Fig. 5(c), when the weight parameter is $\left[0.2,0.7\right]$, the recognition rate is stable
at a high level. To determine the final combination of weight coefficients, the statistical
graphs of recognition rates corresponding to different parameter combinations are
obtained through crossover experiments on the two data sets, as shown in Fig. 7.
Fig. 7. Statistical chart of recognition rate corresponding to different parameter
combinations on two datasets.
According to the results of weight parameter adjustment, firstly, three sets of parameter
combination values are taken on the JAFFE dataset for experiments. The weight parameters
of the first group of three areas are mouth\,$:$\,eyes\,$:$\,face $= 0.2 : 0.4 : 0.4$.
The second group is eyes\,$:$\,mouth\,$:$\,face $= 0.4: 0.3: 0.3$. The third group
is face\,$:$\,mouth\,$:$\,eyes $=0.35: 0.3: 0.35$. From Fig. 6(a), the first group of parameter combinations performs the best in expression recognition,
with an average recognition rate of 96.11%. Therefore, the weighted parameters of
the three regions are $\eta $, $\delta $, $\gamma $ set to 0.4, 0.4, 0.2. To verify
the universality of the parameter combination, three sets of parameter combinations
are taken for comparative experiments on the CK+ dataset. In the first group of experiments,
the parameters of the mouth area, eye area, and face area are set to 0.4, 0.2, 0.4.
The second group is set to 0.2, 0.4, 0.4. The third group is set to 0.35, 0.3, 0.35.
From Fig. 6(b), unlike the experimental results on the JAFFE dataset, the parameter combination
1 (mouth\,$:$\,eyes\,$:$\,face $= 0.4: 0.2: 0.4$) on the CK+ dataset has achieved
better recognition results. This proves that on the CK+ dataset, the same expressions
in the mouth area in the image sequence of consecutive frames have little difference,
so the features extracted from the mouth area have stronger discriminative ability.
The comparison experiment of weighting parameters shows that the weighting parameters
of different regions are related to the data set selected in the experiment, which
needs to be analyzed in specific situations. A confusion matrix obtained from the
recognition experiment of the algorithm on the JAFFE dataset is shown in Table 2.
Table 2. Confusion matrix obtained from a recognition experiment of expression recognition
algorithm on JAFFE dataset.
-
|
AN
|
SA
|
SU
|
HA
|
FE
|
DI
|
NE
|
AN
|
0.91
|
0.02
|
0.00
|
0.01
|
0.00
|
0.00
|
0.02
|
SA
|
0.00
|
0.96
|
0.00
|
0.00
|
0.01
|
0.00
|
0.1
|
SU
|
0.01
|
0.00
|
0.98
|
0.02
|
0.00
|
0.04
|
0.00
|
HA
|
0.05
|
0.00
|
0.00
|
1.00
|
0.06
|
0.01
|
0.00
|
FE
|
0.01
|
0.03
|
0.00
|
0.04
|
0.96
|
0.03
|
0.00
|
DI
|
0.00
|
0.02
|
0.01
|
0.00
|
0.00
|
0.93
|
0.06
|
NE
|
0.02
|
0.00
|
0.1
|
0.00
|
0.05
|
0.02
|
0.96
|
To verify the superiority of the proposed algorithm over other algorithms, the recognition
rates of the LBP, the improved LBP, the HOG, and the fusion feature algorithm are
compared on the JAFFE dataset, and 237 expression images are selected on the CK+ dataset.
The expression recognition rate comparison experiment of the LBP algorithm, improved
LBP algorithm and fusion feature algorithm is carried out. Ten sets of experiments
are conducted on two sets of data sets, and each set of expression images has to be
a training set and an experimental set. The results in Table 2 show that the algorithm
has a good recognition effect on the 7 kinds of expressions in the JAFFE dataset,
the recognition rate is higher than 90%, and the average recognition rate is 95.7%.
The experimental results of the impact of different feature extraction algorithms
on the recognition rate are shown in Fig. 8.
Fig. 8. Results of different extraction algorithms on two datasets.
From Fig. 8(a), the average recognition rate of the improved LBP is about 3% higher than that before
the improvement, while the recognition rate of expression recognition using the HOG
algorithm alone is the lowest. The recognition rate of the ten groups of experiments
of the algorithm used in the study is higher than 95%, and the highest is 98.3%, which
is far superior to other single algorithms. According to the experimental results
on the CK+ dataset in Fig. 8(b), since the CK+ dataset is composed of continuous facial expression sequences, it
is more easily affected by external factors than the JAFFE dataset. The recognition
rates of several algorithms are lower than the results on the JAFFE dataset. However,
the weighted fusion algorithm used in the study still has better recognition performance
than other algorithms.
5. Conclusion
Facial expression is an important way and intuitive expression to convey human emotion
and reflect human psychological state. With the development of technology, expression
recognition has become a hotspot in many fields such as distance education platforms
and entertainment activities. Based on the texture features of the face, the edge
features of the eyebrow area and the mouth area, a feature fusion expression recognition
algorithm was proposed to improve the recognition accuracy. This algorithm improved
the traditional LBP algorithm so that it could extract texture features more effectively.
HOG algorithm was combined to extract edge feature information for feature weighted
fusion to improve expression recognition rate. The optimal combination of weighting
parameters for the face region, eyebrow-eye region, and mouth region on the JAFFE
dataset was determined to be 0.4, 0.4, and 0.2, respectively. On another dataset,
the optimal combination was found to be 0.4, 0.2, and 0.4. The comparative experimental
results of different feature extraction algorithms showed that in the 10 experiments
conducted on the JAFFE dataset, the facial expression recognition rate of the fusion
feature algorithm used in this study was higher than 95%, with the highest reaching
98.3%. Compared to other single traditional LBP algorithms, improved LBP algorithms,
and HOG algorithms, its average recognition rate was higher, increasing by 12.7%,
9.50%, and 21.33%, respectively. However, adding features using this method may result
in a high feature dimension, leading to loss of information during dimensionality
reduction. In future research, it is important to investigate how to balance the contradiction
between increasing features and reducing dimensions.
REFERENCES
I. M. Revina and W. R. S. Emmanuel, ``A survey on human face expression recognition
techniques,'' Journal of King Saud University-Computer and Information Sciences, vol.
33, no. 6, pp. 619-628, 2021.

K. Wang, X. Peng, J. Yang, D. Meng, and Y. Qiao, ``Region attention networks for pose
and occlusion robust facial expression recognition,'' IEEE Transactions on Image Processing,
vol. 29, pp. 4057-4069, 2020.

R. E. Mayer, ``Thirty years of research on online learning,'' Applied Cognitive Psychology,
vol. 33, no. 2, pp. 152-159, 2019.

A. V. Savchenko, L. V. Savchenko, and I. Makarov, ``Classifying emotions and engagement
in online learning based on a single facial expression recognition neural network,''
IEEE Transactions on Affective Computing, vol. 13, no. 4, pp. 2132-2143, 2022.

J. H. Kim, B. G. Kim, P. P. Roy, and D. M. Jeong, ``Efficient facial expression recognition
algorithm based on hierarchical deep neural network structure,'' IEEE Access, vol.
7, pp. 41273-41285, 2019.

M. Andrejevic and N. Selwyn, ``Facial recognition technology in schools: Critical
questions and concerns,'' Learning, Media and Technology, vol. 45, no. 2, pp. 115-128,
2020.

J. Lee and S. Lee, ``A study on experts’ perception survey on elementary AI education
platform,'' Journal of the Korean Association of Information Education, vol. 24, no.
5, pp. 483-494, 2020.

J. Ran, K. Hou, and K. Li, ``A high security distance education platform infrastructure
based on private cloud,'' International Journal of Emerging Technologies in Learning
(iJET), vol. 13, no. 10, pp. 42-54, 2018.

S. Hou and J. Ahn, ``Design and empirical study of an online education platform based
on B2B2C, focusing on the perspective of art education,'' KSII Transactions on Internet
and Information Systems (TIIS), vol. 16, no. 2, pp. 726-741, 2022.

H. J. Lee and D. Lee, ``Method of an assistance for evaluation of learning using expression
recognition based on deep learning,'' Journal of Engineering Education Research, vol.
23, no. 2, pp. 24-30, 2020.

M. I. Georgescu, R. T. Ionescu, and M. Popescu, ``Local learning with deep and handcrafted
features for facial expression recognition,'' IEEE Access, vol. 7, pp. 64827-64836,
2019.

Y. Ye, X. Zhang, Y. Lin, and H. Wang, ``Facial expression recognition via region-based
convolutional fusion network,'' Journal of Visual Communication and Image Representation,
vol. 62, pp. 1-11, 2019.

N. Sun, Q. Li, R. Huan, J. Liu, and G. Han, ``Deep spatial-temporal feature fusion
for facial expression recognition in static images,'' Pattern Recognition Letters,
vol. 119, pp. 49-61, 2019.

S. Zhang, X. Pan, Y. Cui, X. Zhao, and L. Liu, ``Learning affective video features
for facial expression recognition via hybrid deep learning,'' IEEE Access, vol. 7:,
pp. 32297-32304, 2019.

H. Wang, S. Wei, and B. Fang, ``Facial expression recognition using iterative fusion
of MO-HOG and deep features,'' The Journal of Supercomputing, vol. 76, no. 5, pp.
3211-3221, 2020.

A. M. Ali, H. Zhuang, and A. K. Ibrahim, ``Multi-pose facial expression recognition
using rectangular HOG feature extractor and label-consistent KSVD classifier,'' Int.
J. Biom., vol. 12, no. 2, pp. 147-162, 2020.

S. Z. Jumani, F. Ali, S. Guriro, I. A. Kandhro, A. Khan, and A. Zaidi, ``Facial expression
recognition with histogram of oriented gradients using CNN,'' Indian Journal of Science
and Technology, vol. 12, no. 24, pp. 1-8, 2019.

D. G. R. Kola and S. K. Samayamantula, ``A novel approach for facial expression recognition
using local binary pattern with adaptive window,'' Multimedia Tools and Applications,
vol. 80, no. 2, pp. 2243-2262, 2021.

L. Mao, N. Wang, L. Wang, and Y. Chen, ``Classroom micro-expression recognition algorithms
based on multi-feature fusion,'' IEEE Access, vol. 7, pp. 64978-64983, 2019.

M. Rahul, N. Kohli, and R. Agarwal, ``Facial expression recognition using local binary
pattern and modified hidden Markov model,'' International Journal of Advanced Intelligence
Paradigms, vol. 17, no. 3-4, pp. 367-378, 2020.

Ziwei Cui received her bachelor’s degree in Chinese painting from Hubei Institute
of Fine Arts, and a master’s degree in Chinese painting from Yunnan Normal University.
She is currently teaching in Henan Vocational University of Science and Technology,
mainly engaged in college basic painting, art history education work. She
has published a number of academic papers and research projects, and has high attainments
in art education.