Mobile QR Code QR CODE

2024

Acceptance Ratio

21%


  1. (College of Art and Design, Henan Vocational University of Science and Technology, Zhoukou, 466000, China Ziwei_Cui2023@outlook.com)



Art education platform, Expression recognition algorithm, Improved LBP algorithm, Regional feature weighted fusion, HOG algorithm

1. Introduction

The expression of the human face can reflect the information of human's concentration, psychological activities and other information to a certain extent, so expression recognition has gradually become a research hotspot of human-computer interaction and image recognition [1-2]. With the development of the Internet, online education technology is also developing, and educational platforms for various subjects are becoming more and more common [3]. An important research content of online education is to judge students' attention status and knowledge acceptance, and the continuous improvement of psychology and other related disciplines makes this possible through facial expression recognition algorithms [4]. The computer can extract relevant facial feature information and use a human-like thinking structure to understand and analyze emotions such as sadness, surprise, anger, and happiness displayed through facial expressions. It can accurately classify facial expressions [5]. The paper utilizes the improved Local Binary Pattern (LBP) algorithm and Histogram of Oriented Gradient (HOG) algorithm to extract feature information from various regions. It then performs feature fusion and explores the fusion information of multiple features to enhance the possibility of recognizing human facial expressions. Simultaneously, by integrating multiple features, the accuracy and robustness of facial expression recognition can be improved, thereby better understanding and meeting the emotional needs of students on art education platforms. Teachers can obtain targeted emotional analysis results for students, helping them adjust their emotional states and improve learning outcomes. The aim of this paper is to enhance the quality of the teaching and learning experience by utilizing improved algorithms to meet the precise needs of art education platforms for analyzing student emotions.

2. Related Work

The field of online education is experiencing continuous development, with an increasing number of new technologies being applied to address related issues. Andrejevic and other scholars argued that while some may question the educational limitations of face-driven learning, it is important to consider how to apply face recognition technology to specific educational environments. Related face recognition and detection technologies could be used to solve related supervision and safety issues in school education and campus environment [6]. Lee et al conducted a perceptual survey on teaching and learning management, educational content, and the performance of artificial intelligence education platforms based on five criteria. The results showed that the artificial intelligence education platform could provide more convenient accessibility and high-quality educational content, and made teaching and learning management easier [7]. Ran et al. proposed an intelligent model for continuous user authentication to prevent security issues in distance education platforms, such as private data leakage or tampering. The model was designed with high security measures for both the platform infrastructure and the user authentication process. This provided new insights into the security of distance education [8]. Based on the customer behavior theory of the B2B2C platform, Hou and other researchers proposed an online education model for art education combining two models. This study expanded the scope of information technology as an art education platform from an academic point of view, and provided effective theoretical support for an online art education platform [9]. Lee et al. used deep learning algorithms in artificial intelligence to obtain quantitative evaluation results in learning evaluation. The specific implementation method was to use the convolutional neural network model to train facial expressions, and divide the expression data into three categories: easy, medium, and difficult. The research results provided relevant support for the intelligent education evaluation system [10].

With the continuous advancement of human-computer interaction technology, facial expression recognition algorithms are widely used in areas such as safe driving, remote driving, game feedback, and psychological analysis. Georgescu MI et al. combined automatic feature learning with manual feature calculation, and used a Support Vector Machine (SVM) classifier with a one-to-one classification strategy on the training samples. Classification was carried out, and a high accuracy rate of 87.76 was obtained on the experimental data set [11]. Ye and other scholars proposed a region-based convolutional fusion network to ensure the effectiveness and robustness of learning in different samples from three aspects: reducing interference, integrating semantic levels and introducing constraint penalty loss. By establishing a muscle motion model, segmentation and extraction of key frontal and facial regions were carried out to reduce interference caused by differences in the size and position of individual facial organs [12]. Sun and other scholars have proposed a multi-channel depth neural network that learns and fuses spatio-temporal features for facial expression recognition in static images. This method extracted the temporal information of the expression from the changing optical flow between the peak and neutral expression images, and used the grayscale image of the former as the spatial information. The average recognition accuracy of this method on three data sets was about 99.05% [13]. Scholars such as Zhang proposed a video sequence face recognition method based on a hybrid deep learning model, which uses two convolutional neural networks to learn high-level spatial features of segmented static facial images and temporal features of optical flow images [14].

Based on current research by domestic and foreign scholars on online education platforms and expression recognition algorithms, it is evident that online education places significant emphasis on students' emotional feedback during online learning. Expression recognition algorithms, a research hotspot in emotional computing, are also utilized in online art education. Additionally, student sentiment analysis is available.

3. Empowering Art Education Platforms for Enhanced Learning: A Fusion Approach with Improved LBP and HOG Algorithms for Expression Recognition

3.1 Optimizing Facial Expression Recognition in Art Education: Enhanced LBP and HOG Algorithms

At present, the related face recognition algorithm is relatively mature, and it also provides a lot of help and references for expression recognition [15]. There are many facial expression feature extraction methods for recognition in front of us, but the corresponding features extracted by various algorithms often have certain differences, which will also affect the accuracy of recognition [16]. The paper improves the original LBP algorithm, and uses the HOG algorithm to optimize the expression feature extraction. The LBP algorithm can effectively extract the texture features describing the distribution relationship between a single pixel and its domain pixels in the image. When the traditional LBP operator extracts image features, it first defines a 3?3 texture unit, and the pixel gray value in the center of the cell is used as the threshold value to compare the pixel gray values in the other 8 fields, and the pixel is encoded as 1 or 0, as shown in (1).

(1)
$ CODE\left(g_{e} -g_{c} \right)=\left\{\begin{aligned} 1,\;g_{e} \ge g_{c},\\ 0,\;g_{e} <g_{c}. \end{aligned}\right. $

In (1), $g_{c} $ represents the gray value of the center pixel, and $g_{e} $ represents the gray value of the remaining 8 pixels. The pixels in 8 areas are combined into a binary number in a specific direction, and the corresponding decimal number is the LBP feature of the texture unit. Due to the limitation of its area coverage, the traditional LBP operator cannot satisfy the texture feature extraction of different scales. To address this issue, some scholars have modified the 3?3 fixed window by making it adjustable in size and expanding the domain from a square to a circle with a radius of R. The circular LBP operators at different scales are shown in Fig. 1.

Fig. 1. LBP operators at different scales.

../../Resources/ieie/IEIESPC.2025.14.1.22/image1.png

A circular area with a radius of R and containing e pixels $LBP_{R}^{e} $ will generate $2^{e} $ species codes. For example of $LBP_{1.5}^{12} $, there are 4096 kinds of LBP codes. The more pixels there are, the more complex the feature extraction is. To solve this problem, the code change of two adjacent pixels is counted as ``one jump'', and the number of jumps is less than or equal to two is called ``unified mode''. Statistical results show that the unified mode accounts for more than 90% of all coding modes, that is, for $LBP_{1}^{8} $, there are three modes in the unified mode $e\left(e-1\right)$, and the number of coding modes is reduced from 256 to 56. The original LBP algorithm only considers the relationship between the central pixel and other pixels, ignoring the role of the central pixel and the difference in gray value between pixels, which easily causes the loss of information on local structural features. The paper introduces a threshold T based on the LBP algorithm. The gray level difference between the central pixel and each adjacent pixel is calculated, and then the absolute values of all the differences is added to get the mean value, and the obtained mean value is the threshold T. The difference between the grayscale values of the central pixel and the regional pixel is taken as absolute values in a certain order, and the absolute values are compared with the threshold T and encoded. The obtained binary code is converted into a decimal number as shown in (2).

(2)
$ LBP^{*} {}_{R}^{e} =\sum _{i=1}^{e-1}CODE\left(g_{e} -g_{c} \right) 2^{e}. $

The decimal number $LBP^{*} {}_{R}^{e} $ in (2) represents the LBP feature of the center point. The original and improved LBP algorithm codes are shown in Fig. 2.

Fig. 2. Schematic diagram of original and improved LBP algorithm coding.

../../Resources/ieie/IEIESPC.2025.14.1.22/image2.png

In Fig. 2(a), according to the original LBP, the encoding of two window units with large differences in central pixel points obtains the same LBP encoding, that is, they are classified into the same mode during recognition. In Fig. 2(b), the improved LBP algorithm can be used for encoding to avoid classification misjudgment caused by classifying bright spots and dark points into one category. Considering that there are differences between different facial parts in the expression image, directly extracting LBP features from the entire image and generating a histogram may lose the local differences of the face, so the LBP histogram of the image is extracted by using a block method, and then the single The LBP histograms of the blocks are concatenated to form a composite histogram. To describe local features and facial edge information more effectively, HOG is introduced to extract features from local regions of facial images. HOG divides the expression image into single non-overlapping cell units, and uses the HOG series of single-cell units to form the HOG features of the entire image. When HOG extracts features, first to use the gamma correction method to normalize the color space of the image to reduce the interference caused by light, as shown in (3).

(3)
$ S\left(x,y\right)=S\left(x,y\right)^{Gamma} . $

In (3), $S\left(x,y\right)$ represents the pixel value compressed by Gamma, and $\left(x,y\right)$ represents the pixel point. Then the pixel gradient is calculated to extract the local contour information, and the calculation expression is shown in (4).

(4)
$ \left\{\begin{aligned} G_{x} \left(x,y\right)=S\left(x+1,y\right)-S\left(x-1,y\right),\\ G_{y} \left(x,y\right)=S\left(x,y+1\right)-S\left(x,y-1\right). \end{aligned}\right. $

In (4), $G_{x} \left(x,y\right)$ is the calculated pixel gradient value in the horizontal direction. $G_{y} \left(x,y\right)$ represents the pixel gradient value in the vertical direction. The gradient magnitude of the pixel is calculated as shown in (5).

(5)
$ G\left(x,y\right)=\sqrt{G_{x} \left(x,y\right)^{2} +G_{y} \left(x,y\right)^{2} } $

In (5), $G\left(x,y\right)$ represents the gradient magnitude of the pixel. The gradient direction value of the pixel point is $d\left(x,y\right)$, as shown in (6).

(6)
$ d\left(x,y\right)=\tan ^{-1} \left[\frac{G_{y} \left(x,y\right)}{G_{x} \left(x,y\right)} \right] $

Each cell unit contains 16 histogram channels, and the gradient direction is divided into 16 direction blocks, that is to say, the histogram of a cell unit is a 16-dimensional histogram. Several adjacent cell units are divided into a block, and the features of all cell units in the block are concatenated to form a block feature. All blocks are connected into a vector in a certain order, and this vector is the HOG feature vector of the emoticon image. The division of a cell unit is shown in Fig. 3.

Fig. 3. Schematic diagram of division of A cell unitf.

../../Resources/ieie/IEIESPC.2025.14.1.22/image3.png

3.2 Transformative Art Education: Regional Feature Weighted Fusion of Improved LBP and HOG Algorithms for Expression Recognition

Due to the abundance of high-dimensional data obtained through feature extraction, it is necessary to reduce the dimensionality of the original features before classification to decrease the computational load. The study uses the classic Principal Component Analysis (PCA) algorithm for dimensionality reduction. PCA mainly converts high-dimensional feature vectors into orthogonal basis vectors through KL transformation, and analyzes the statistical characteristics of a certain part of the training samples of the orthogonal basis vectors to determine the required target low-dimensional space basis vectors at one time [17-18]. The mathematical principle of KL transformation is shown in formula (7).

(7)
$ A_{n} =\sum _{j=1}^{n}\alpha _{j} \phi _{j} . $

In (7), $A_{n} $ refers to a $n$ dimensional random variable denoted by weighted basis vectors $\alpha _{j} $. $n$ denotes the weighted parameter values. $\phi _{j} $ denotes the basis vectors. The matrix form of $A_{n} $ is shown in (8).

(8)
$ A_{n} =\left(\phi _{1} ,~\phi _{2} ,~\cdots,~\phi _{n} \right)\left(\alpha _{1} ,~\alpha _{2} ,~\cdots,~ \alpha _{n} \right)^{T} . $

In (8), the sum $(\phi _{1} $, $\phi _{2} $, $\cdots$, $\phi _{n})$ takes the base vector $(\alpha _{1} $, $\alpha _{2} $, $\cdots$, $\alpha _{n} )^{T} $ as the orthogonal vector, and the specific expression is shown in the (9).

(9)
$ \phi ^{T} \phi _{k} =\left\{\begin{aligned} 1,\;j=k,\\ 0,\;j\ne k. \end{aligned}\right. $

In (9), $\phi $ is an orthogonal matrix. According to the characteristics of the orthogonal matrix, the expression of $A_{n} $ shown in (10) can be obtained by multiplying both sides of the matrix expression at the same time.

(10)
$ \alpha _{i} =\phi _{j}^{T} A_{n} . $

To ensure that the components in the vector $\alpha $ are not correlated with each other, the auto-correlation matrix of the random vector is recorded as $A_{m} =E[A_{n}^{T}$~ $A_{n}]$. The expression shown in (11) can be obtained.

(11)
$ A_{m} =E\left[A_{n}^{T} A_{n} \right]=E\left[\phi \alpha \alpha {}^{T} \phi ^{T} \right]=\phi E\left[\alpha \alpha ^{T} \right]\phi ^{T} . $

Finally, the expression shown in (12) can be obtained through calculation. $\lambda _{j} $ is the eigenvalue in the auto-correlation matrix and $\phi _{k} $ is the corresponding eigenvector. According to the properties of real symmetric matrices, if the obtained eigenvalues are different, the corresponding eigenvectors are orthogonal vectors.

(12)
$ A_{m} \phi _{k} =\lambda _{j} \phi _{k} ,~(k=1,~2,~\ldots ,~n). $

For $N$, a sample matrix containing samples $A=[A_{1} $, $A_{2} $, $A_{N} ]^{T} $, each sample has $P$ features. Using PCA to reduce the sample from $P$ dimension to $Q$ dimension ($Q<P$) first needs to calculate the covariance matrix of the sample. The difference between the sample value and the sample mean is used as the deviation between the observed value and the mean in each dimension. Then the $P\times P$ covariance matrix $C^{P\times P} $ of the dimension is obtained. The calculation of the sample mean is shown in (13).

(13)
$ avg\left(i\right)=\frac{1}{N} \sum _{j=1}^{N}A_{j,i}. $

By calculating the eigenvalues and eigenvectors of the covariance matrix and sorting them in descending order according to the eigenvalues, the first $Q$ eigenvalues and eigenvectors are screened out to form the eigenmatrix $avg\left(i\right)=\frac{1}{N} \sum _{j=1}^{N}A_{j,i} $. The PCA dimensionality reduction matrix can be obtained by multiplying the sample matrix and the feature matrix $B$. The dimensionality of the dimensionality reduction matrix is $N\times Q$. If only a single feature describing part of the feature information is used for classification and recognition, the recognition accuracy is easily affected by environmental changes [19-20]. In this study, the texture information extracted by the improved LBP algorithm is fused with the contour edge shape information extracted by the HOG algorithm, so as to express the expression information of the image through fusion features and carry out classification and recognition. Since the expression changes are mainly reflected in the eyebrow area and the mouth area, the texture features extracted by the improved LBP algorithm are used as feature 1. The edge shape information of the eyebrow area and mouth area extracted by HOG are respectively used as feature 2 and feature 3. For the three-feature information, the fusion feature information is obtained after dimensionality reduction processing, normalization processing, and weighted fusion. The flow chart of feature fusion in different regions of the face is shown in Fig. 4.

Fig. 4. Flow chart of feature fusion for different facial regions.

../../Resources/ieie/IEIESPC.2025.14.1.22/image4.png

In Fig. 4, the study utilizes the eyebrow, eye and mouth regions for expression recognition, which are the most effective parts when recognizing expressions. The study firstly crops out the two regions of eyebrows-eyes and mouth as region 2 and region 3, respectively. Secondly, the study utilizes the LBP algorithm to extract texture features from the preprocessed pure expression image region as region 1. The LBP algorithm is able to capture the local texture changes in the image. After obtaining the initial feature information of the three regions, the study carries out the dimensionality reduction process to simplify the data and reduce the amount of computation. The next is the normalization of the feature information to ensure that each feature has the same weight in the subsequent processing. Finally, weighted fusion is performed according to (14) to integrate the feature information of the three regions to obtain more comprehensive and accurate expression recognition results.

(14)
$ \left\{\begin{aligned} F=\eta \cdot T_{1} +\delta E_{2} +\gamma E_{3},\\ \eta +\delta +\gamma =1. \end{aligned}\right. $

In (14), $F$ represents the eigenvalues obtained after weighted fusion. $T_{1} $ represents the processed facial texture features. $E_{2} $ and $E_{3} $ represent the edge shape information of the eyebrows, eyes and mouth regions, respectively. $\eta $, $\delta $, and $\gamma $ are the weighting coefficients of the three regional features, respectively. Through in-depth analysis of the expression dataset, the study found that there is significant variability in the mouth region. Even in the same kind of expressions, the mouth images of different individuals show significantly different morphologies and variations. This variability prevents the feature information of the mouth region from being the focus of recognition in expression recognition. Therefore, in the process of weighted fusion, the study needs to set the weight coefficient $\gamma $ of the mouth region smaller than $\eta $ and $\delta $ of the eyebrow-eye region. In this way, the feature information of the mouth region can be used as auxiliary information, combined with the features of the eyebrow and eye regions, to jointly provide a more comprehensive and accurate basis for expression recognition and classification.

4. Expression Recognition Results of Fusion Features for Student Emotion Analysis on Education Platform

Based on the JAFFE and CK+ expression datasets, the study set up the influence experiment of the extraction area on different datasets, the weighting parameter selection experiment, and the comparison experiment of feature extraction algorithms to analyze the accuracy of expression recognition under different influencing factors. When the improved LBP algorithm extracts facial texture features, it chooses to divide the face image into $6\times5$ blocks, and obtains $6\times5\times256=7680$ dimensional features and performs dimensionality reduction processing. When the HOG algorithm extracts edge features, the cell unit of $4\times4$ pixels is selected for image division, and a cell unit volume is divided into 9 bins, which is a 9-dimensional histogram. The original feature of a block extracted is $2\times2\times9=36$ dimensions. The fusion features obtained after correlation processing and weighted fusion are used as the identification features of the classifier and are classified and identified. The JAFFE expression database includes a total of 2107 expressions recorded by 10 testers, namely anger (AN), sadness (SA), surprise (SU), happiness (HA), fear (FE), disgust (DI) and Neutral (NE). The CK+ expression dataset consists of 593 expression video sequences by collecting the expressions of 123 testers. There are 327 image sequences in the library to mark the expressions of the images with numerical labels from 1 to 7. The numbers from small to large represent AN, contempt (CO), DI, FE, HA, SA, SU. In the experiment on the influence of different extraction regions on the recognition rate, the JAFFE dataset is preprocessed first to obtain a $120\times120$ face image, 104?32 size for eye and brow area, and 64?32 size for the mouth area. The study uses K-Nearest Neighbor (KNN) classifier and SVM classification to conduct expression classification and recognition experiments to determine a better classifier. The experiment adopts the cross-validation method: one of the 10 groups of testers is selected as the test sample, and the rest are used as the training samples. Finally, the average recognition rate of the 10 groups of expression recognition is taken as the result of the single expression recognition. The recognition results of the two classifiers on the JAFFE dataset are shown in Table 1.

Table 1. Recognition results of KNN classifier and SVM classifier on JAFFE dataset.

Expression

Expression recognition rate(%)

KNN classifier

SVM classifier

AN

93.5

96.7

HA

93.3

95.9

SA

93.7

96.1

SU

94.6

97.6

DI

93.9

92.5

FE

92.4

94.2

NE

9 7.2 %

98.4 _

Average

94.09

95.91

In Table 1, the recognition rate of the SVM classifier for the seven expressions is basically higher than that of the KNN classifier, and the average recognition rate of the SVM classifier is also slightly higher than that of the KNN classifier. The recognition rate of the KNN classifier is lower than 95% except for the neutral expression, while the recognition rate of the SVM classifier is higher than 95% for the rest of the expressions except the disgust expression. Therefore, the paper chooses SVM as the classification of expression recognition algorithm. The experimental results of the influence of different weighted regions are shown in Fig. 5.

Fig. 5. Recognition rate results of different recognition areas.

../../Resources/ieie/IEIESPC.2025.14.1.22/image5.png

Three groups of experiments are carried out with different combinations of three extracted regions to analyze the influence of weighted regions on recognition rate. The first experiment only extracts the features of the face area. The second experiment extracts the features of the face area and the mouth area. The third experiment extracts the features of the face area, eyebrows and eye areas. The three groups of experiments adopt the cross-validation method to ensure the validity. From Fig. 4, in the three groups of experiments, the expression recognition effect of the combination of the face area and the eyebrow and eye area is the best, followed by the combination of the face area and the mouth area, and the recognition effect of the face area alone is the worst. The face area is fused with the eyebrow and eye area to recognize 7 kinds of expressions, and the recognition rates of surprised and neutral expressions are relatively high, which are 94.7% and 94.2% respectively. The average recognition rates of expressions in the three groups of experiments in descending order are 89.84% (face+eyes), 87.79% (face+mouth) and 86.83% (face). The results of the recognition rate show that the fusion of extracted features from different regions of the face is effective and necessary for expression recognition. At the same time, it is verified that the mouth region has large individual differences, and the region weighting parameter is too large to increase the probability of misjudgment. Therefore, the weighting parameter value of the mouth region should be smaller than the face and eyebrow eye area. Based on the JAFFE data set, the region weighting parameter selection experiment is carried out, and the adjustment of the weighting coefficients of each region is shown in Fig. 6.

Fig. 6. Adjustment results of weighting coefficients of Different extraction regions.

../../Resources/ieie/IEIESPC.2025.14.1.22/image6.png

The region weighting parameter selection experiment also adopts the cross-validation method to ensure the validity and accuracy. The experiment finds the optimal parameter combination by changing the weighting coefficients of the three areas. From Figs. 5(a) and 5(b), when adjusting the weighting parameters of the mouth region and eyebrow and eye region, the recognition rate increases first and then decreases with the increase of the weighting coefficient. The maximum recognition rate (96.7%, 96.7%) is achieved when the weight parameter of the mouth area is equal to 0.2 and the parameter of the eyebrow area is equal to 0.4. Observing the weight parameter adjustment results of the face area in Fig. 5(c), when the weight parameter is $\left[0.2,0.7\right]$, the recognition rate is stable at a high level. To determine the final combination of weight coefficients, the statistical graphs of recognition rates corresponding to different parameter combinations are obtained through crossover experiments on the two data sets, as shown in Fig. 7.

Fig. 7. Statistical chart of recognition rate corresponding to different parameter combinations on two datasets.

../../Resources/ieie/IEIESPC.2025.14.1.22/image7.png

According to the results of weight parameter adjustment, firstly, three sets of parameter combination values are taken on the JAFFE dataset for experiments. The weight parameters of the first group of three areas are mouth\,$:$\,eyes\,$:$\,face $= 0.2 : 0.4 : 0.4$. The second group is eyes\,$:$\,mouth\,$:$\,face $= 0.4: 0.3: 0.3$. The third group is face\,$:$\,mouth\,$:$\,eyes $=0.35: 0.3: 0.35$. From Fig. 6(a), the first group of parameter combinations performs the best in expression recognition, with an average recognition rate of 96.11%. Therefore, the weighted parameters of the three regions are $\eta $, $\delta $, $\gamma $ set to 0.4, 0.4, 0.2. To verify the universality of the parameter combination, three sets of parameter combinations are taken for comparative experiments on the CK+ dataset. In the first group of experiments, the parameters of the mouth area, eye area, and face area are set to 0.4, 0.2, 0.4. The second group is set to 0.2, 0.4, 0.4. The third group is set to 0.35, 0.3, 0.35. From Fig. 6(b), unlike the experimental results on the JAFFE dataset, the parameter combination 1 (mouth\,$:$\,eyes\,$:$\,face $= 0.4: 0.2: 0.4$) on the CK+ dataset has achieved better recognition results. This proves that on the CK+ dataset, the same expressions in the mouth area in the image sequence of consecutive frames have little difference, so the features extracted from the mouth area have stronger discriminative ability. The comparison experiment of weighting parameters shows that the weighting parameters of different regions are related to the data set selected in the experiment, which needs to be analyzed in specific situations. A confusion matrix obtained from the recognition experiment of the algorithm on the JAFFE dataset is shown in Table 2.

Table 2. Confusion matrix obtained from a recognition experiment of expression recognition algorithm on JAFFE dataset.

-

AN

SA

SU

HA

FE

DI

NE

AN

0.91

0.02

0.00

0.01

0.00

0.00

0.02

SA

0.00

0.96

0.00

0.00

0.01

0.00

0.1

SU

0.01

0.00

0.98

0.02

0.00

0.04

0.00

HA

0.05

0.00

0.00

1.00

0.06

0.01

0.00

FE

0.01

0.03

0.00

0.04

0.96

0.03

0.00

DI

0.00

0.02

0.01

0.00

0.00

0.93

0.06

NE

0.02

0.00

0.1

0.00

0.05

0.02

0.96

To verify the superiority of the proposed algorithm over other algorithms, the recognition rates of the LBP, the improved LBP, the HOG, and the fusion feature algorithm are compared on the JAFFE dataset, and 237 expression images are selected on the CK+ dataset. The expression recognition rate comparison experiment of the LBP algorithm, improved LBP algorithm and fusion feature algorithm is carried out. Ten sets of experiments are conducted on two sets of data sets, and each set of expression images has to be a training set and an experimental set. The results in Table 2 show that the algorithm has a good recognition effect on the 7 kinds of expressions in the JAFFE dataset, the recognition rate is higher than 90%, and the average recognition rate is 95.7%. The experimental results of the impact of different feature extraction algorithms on the recognition rate are shown in Fig. 8.

Fig. 8. Results of different extraction algorithms on two datasets.

../../Resources/ieie/IEIESPC.2025.14.1.22/image8.png

From Fig. 8(a), the average recognition rate of the improved LBP is about 3% higher than that before the improvement, while the recognition rate of expression recognition using the HOG algorithm alone is the lowest. The recognition rate of the ten groups of experiments of the algorithm used in the study is higher than 95%, and the highest is 98.3%, which is far superior to other single algorithms. According to the experimental results on the CK+ dataset in Fig. 8(b), since the CK+ dataset is composed of continuous facial expression sequences, it is more easily affected by external factors than the JAFFE dataset. The recognition rates of several algorithms are lower than the results on the JAFFE dataset. However, the weighted fusion algorithm used in the study still has better recognition performance than other algorithms.

5. Conclusion

Facial expression is an important way and intuitive expression to convey human emotion and reflect human psychological state. With the development of technology, expression recognition has become a hotspot in many fields such as distance education platforms and entertainment activities. Based on the texture features of the face, the edge features of the eyebrow area and the mouth area, a feature fusion expression recognition algorithm was proposed to improve the recognition accuracy. This algorithm improved the traditional LBP algorithm so that it could extract texture features more effectively. HOG algorithm was combined to extract edge feature information for feature weighted fusion to improve expression recognition rate. The optimal combination of weighting parameters for the face region, eyebrow-eye region, and mouth region on the JAFFE dataset was determined to be 0.4, 0.4, and 0.2, respectively. On another dataset, the optimal combination was found to be 0.4, 0.2, and 0.4. The comparative experimental results of different feature extraction algorithms showed that in the 10 experiments conducted on the JAFFE dataset, the facial expression recognition rate of the fusion feature algorithm used in this study was higher than 95%, with the highest reaching 98.3%. Compared to other single traditional LBP algorithms, improved LBP algorithms, and HOG algorithms, its average recognition rate was higher, increasing by 12.7%, 9.50%, and 21.33%, respectively. However, adding features using this method may result in a high feature dimension, leading to loss of information during dimensionality reduction. In future research, it is important to investigate how to balance the contradiction between increasing features and reducing dimensions.

REFERENCES

1 
I. M. Revina and W. R. S. Emmanuel, ``A survey on human face expression recognition techniques,'' Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 6, pp. 619-628, 2021.DOI
2 
K. Wang, X. Peng, J. Yang, D. Meng, and Y. Qiao, ``Region attention networks for pose and occlusion robust facial expression recognition,'' IEEE Transactions on Image Processing, vol. 29, pp. 4057-4069, 2020.DOI
3 
R. E. Mayer, ``Thirty years of research on online learning,'' Applied Cognitive Psychology, vol. 33, no. 2, pp. 152-159, 2019.DOI
4 
A. V. Savchenko, L. V. Savchenko, and I. Makarov, ``Classifying emotions and engagement in online learning based on a single facial expression recognition neural network,'' IEEE Transactions on Affective Computing, vol. 13, no. 4, pp. 2132-2143, 2022.DOI
5 
J. H. Kim, B. G. Kim, P. P. Roy, and D. M. Jeong, ``Efficient facial expression recognition algorithm based on hierarchical deep neural network structure,'' IEEE Access, vol. 7, pp. 41273-41285, 2019.DOI
6 
M. Andrejevic and N. Selwyn, ``Facial recognition technology in schools: Critical questions and concerns,'' Learning, Media and Technology, vol. 45, no. 2, pp. 115-128, 2020.DOI
7 
J. Lee and S. Lee, ``A study on experts’ perception survey on elementary AI education platform,'' Journal of the Korean Association of Information Education, vol. 24, no. 5, pp. 483-494, 2020.DOI
8 
J. Ran, K. Hou, and K. Li, ``A high security distance education platform infrastructure based on private cloud,'' International Journal of Emerging Technologies in Learning (iJET), vol. 13, no. 10, pp. 42-54, 2018.DOI
9 
S. Hou and J. Ahn, ``Design and empirical study of an online education platform based on B2B2C, focusing on the perspective of art education,'' KSII Transactions on Internet and Information Systems (TIIS), vol. 16, no. 2, pp. 726-741, 2022.DOI
10 
H. J. Lee and D. Lee, ``Method of an assistance for evaluation of learning using expression recognition based on deep learning,'' Journal of Engineering Education Research, vol. 23, no. 2, pp. 24-30, 2020.DOI
11 
M. I. Georgescu, R. T. Ionescu, and M. Popescu, ``Local learning with deep and handcrafted features for facial expression recognition,'' IEEE Access, vol. 7, pp. 64827-64836, 2019.DOI
12 
Y. Ye, X. Zhang, Y. Lin, and H. Wang, ``Facial expression recognition via region-based convolutional fusion network,'' Journal of Visual Communication and Image Representation, vol. 62, pp. 1-11, 2019.DOI
13 
N. Sun, Q. Li, R. Huan, J. Liu, and G. Han, ``Deep spatial-temporal feature fusion for facial expression recognition in static images,'' Pattern Recognition Letters, vol. 119, pp. 49-61, 2019.DOI
14 
S. Zhang, X. Pan, Y. Cui, X. Zhao, and L. Liu, ``Learning affective video features for facial expression recognition via hybrid deep learning,'' IEEE Access, vol. 7:, pp. 32297-32304, 2019.DOI
15 
H. Wang, S. Wei, and B. Fang, ``Facial expression recognition using iterative fusion of MO-HOG and deep features,'' The Journal of Supercomputing, vol. 76, no. 5, pp. 3211-3221, 2020.DOI
16 
A. M. Ali, H. Zhuang, and A. K. Ibrahim, ``Multi-pose facial expression recognition using rectangular HOG feature extractor and label-consistent KSVD classifier,'' Int. J. Biom., vol. 12, no. 2, pp. 147-162, 2020.DOI
17 
S. Z. Jumani, F. Ali, S. Guriro, I. A. Kandhro, A. Khan, and A. Zaidi, ``Facial expression recognition with histogram of oriented gradients using CNN,'' Indian Journal of Science and Technology, vol. 12, no. 24, pp. 1-8, 2019.DOI
18 
D. G. R. Kola and S. K. Samayamantula, ``A novel approach for facial expression recognition using local binary pattern with adaptive window,'' Multimedia Tools and Applications, vol. 80, no. 2, pp. 2243-2262, 2021.DOI
19 
L. Mao, N. Wang, L. Wang, and Y. Chen, ``Classroom micro-expression recognition algorithms based on multi-feature fusion,'' IEEE Access, vol. 7, pp. 64978-64983, 2019.DOI
20 
M. Rahul, N. Kohli, and R. Agarwal, ``Facial expression recognition using local binary pattern and modified hidden Markov model,'' International Journal of Advanced Intelligence Paradigms, vol. 17, no. 3-4, pp. 367-378, 2020.DOI
Ziwei Cui
../../Resources/ieie/IEIESPC.2025.14.1.22/author1.png

Ziwei Cui received her bachelor’s degree in Chinese painting from Hubei Institute of Fine Arts, and a master’s degree in Chinese painting from Yunnan Normal University. She is currently teaching in Henan Vocational University of Science and Technology, mainly engaged in college basic painting, art history education work. She has published a number of academic papers and research projects, and has high attainments in art education.