3.1 Action Recognition Classification Algorithm Combining Weight Offset and Dimension
Reduction Retrieval
Aiming at the problem that the standard CNN cannot understand dynamic information
in the action recognition classification task, a weight-offset module is designed.
This module assigns a higher weight parameter to the convolutional layer and uses
this particular convolutional layer as other convolutional layers. The weight offset
was carried out based on information difference because the neural network is different
from the dynamic information of the original data after learning. The network ability
was optimized to understand the time-domain dynamic information and improve the classification
accuracy of the model [16]. Fig. 1 shows the framework of the weight offset module.
In Fig. 1, the convolutional layer first sorts the learning potential of each dynamic input
data, then calculates similarity and assigns more learning potential to dynamic data.
The gradient of the weight parameters of each convolutional layer is improved, and
the direction of optimization is changed. The update of weight parameters was modified
by adding time domain constraints to the loss function.
This procedure was repeated by optimizing and updating the dynamic data with the lowest
weight into the convolutional layer. In contrast, the dynamic data with the highest
weight is used as the output. Fig. 2 presents the evolution of the data dynamic information throughout the training procedure.
In Fig. 2, the squares represent the dynamic data of the convolutional neural network input
layer and several intermediate layer data. A closer color of the squares indicated
a higher similarity of dynamic data between layers. The convolutional layer closest
to the dynamic data and input layer was the dominant layer. The module achieved the
effect of optimizing training by reducing the difference in time-domain dynamic information
between the convolutional layer and the input layer as a whole and improving the network
model ability to understand time-domain information. The dynamic information set of
the convolutional layer data was ${\delta}$. The dynamic information set of the original
data $T^{c}$ of the input layer was $T$. The numerical difference was $d$. The calculation
method is expressed as formula (1).
$\delta $ is the medicalization of the dynamic information set. The dynamic information
set of each data layer was calculated by the difference in frame numbers to obtain
more comprehensive dynamic information and prevent valid information from being excluded,
as shown in Eqs. (2) and (3).
Through the dynamic information extraction of Eqs. (2) and (3), the overfitting of the network can be avoided owing to the high local attention
of the weight-offset module during the optimization process. The dynamic information
of the feature data of the convolutional layer can be calculated according to Eqs.
(4) and (5).
$u^{c}$ is the feature data. $n$is the total number of frames in the $i$$^{\mathrm{th}}$
data group. $t$ is any frame in the data group. The calculation process of the original
data action information set of the input layer is the same as the above method, so
it is omitted. For the different number of data channels in the convolutional layer,
this study samples the feature data randomly with a large number of channels in proportion
to ensure that the distribution of each data group remains consistent. When updating
the weight parameters, the weight offset module needs to quantify the distance between
$T^{c}$ and $T$, and takes the quantized result as the time domain constraint of the
neural network loss function. The combination of time domain constraints and loss
functions can jointly affect the gradient of the network. The distance between the
sum $T^{c}$and the sum $T$ is minimized while ensuring the accuracy of the optimized
model. The time domain difference between the data of the weight offset module and
the original data of the input layer can be reduced. Eqs. (6) and (7) express the loss function and the update operations of weight parameters after adding
time domain constraints, respectively.
$L_{p}\left(l,\hat{l}\right)$ is the loss function composed of the real value of the
input sample and the output value of the model. $\lambda $is the constraint weight.
$D\left(T,T^{c}\right)$ is the time domain constraint. $W$is the weight parameter
of the convolutional layer. Because the original and feature data are not in the same
data space, the common difference calculation method could not reproduce this cross-spatiality,
resulting in inaccurate calculation results. Transfer learning was adopted for multi-domain
adaptation, and the Maximum Mean Discrepancy (MMD) algorithm was used to quantify
the distance of cross-spatial dynamic information accurately. The MMD algorithm maps
the two datasets using a continuous function set in a certain sample space and judges
whether the stones belong to the same space according to the difference in the mapping
results. The selection of $F$ is the key to determining the accuracy of the calculation
result. The definition of the MMD algorithm is shown in Eqs. (8) and (9).
$p$ and $q$ are the Borel probability distributions. $X$ and $Y$ are the sum of the
data set $f$ obtained by the independent and identical distribution of the Borel probability
distribution, respectively. $m$ is a function $n$ of the function set $F$. The weight-offset
module uses the MMD algorithm of the regenerated kernel Hilbert space to map the two
sets of action information sets in different spaces into the Hilbert space with the
kernel function. The distance between the two data groups is equivalent to the quantified
differences between two groups in action information across spatial-temporal domains.
The completeness and regularity of the Hilbert space can ensure the stability of the
calculation results. The deviation was not too large in the case of a large amount
of sample data. The mapping process of the regenerated kernel Hilbert space is expressed
as Eqs. (10) and (11).
$\varphi $ is the data that is mapped with a Gaussian kernel. $H$ is the Hilbert space.
$x'$is the transpose of the input data. $\sigma $ is the sample standard deviation.
The dynamic data of the original data of the input layer and the dynamic data of the
weight-offset layer are used as the input cross-spatial data. The maximum mean difference
between the two sets of time-domain dynamic information is used as the constraint,
as shown in Eq. (12).
$T$ is the dynamic information set of the original data of the input layer. $T^{c}$
is the dynamic information set of the feature data of the weight-offset layer. The
key frame of the $M$ image set is$I$, ${I_{{_{i}}}}_{i=1}^{M}$ is set as the input
of the model. The number of image blocks is expressed as (13).
$n$ is the size of the image. $k$is the size of the convolution kernel. Each image,
according to the column vector, is sorted. $X_{i}$ is the dimensional feature vector
of any image in the image set $n$. The feature matrix for dimensionality reduction
processing is extracted to form a new low-latitude feature library. The feature vector
$F_{i}$ is extracted, and the similarity is calculated. The feature matrix of each
row is converted as a vector. The similarity is judged by comparing the cosine angle
between the vectors, which is expressed as formula (14).
$x_{i}$ and $y_{i}$ are two vector elements. $n$ is the number of vector elements.
A closer value of the calculation result indicates that the two vectors are closer.
Finally, the value range of the calculation result is normalized, and the normalized
interval is expressed as Eq. (15).
$S$ is sorted by size and output of the search results. According to the similar metric
of the feature vector and feature library of the image to be retrieved, the feature
index is performed, the corresponding image is found, and the top-ranked image is
displayed according to the decreasing law.
Fig. 1. Weight offset module framework.}
Fig. 2. Effect of weight offset layer on yes and dynamic information.}
3.2 An Online-offline Blended Teaching Model Integrating Deep Learning for Chinese
Painting Education
Blended teaching refers to integrating multiple teaching modes and teaching methods,
which are usually assisted by digital information technology to improve teachers’
teaching effects and students’ learning efficiency [17]. To a certain extent, Chinese painting education is more about cultivating students’
emotional identification with China’s traditional culture rather than painting skills
and aesthetic cognition [18]. Fig. 3 shows the online-offline blended teaching mode of Chinese painting in colleges and
universities.
The blended teaching model is divided into three parts (Fig. 3). The first part analyzes teaching needs, the second part is online learning, and
the third part is an online classroom. According to the Chinese painting learning
environment and resources in colleges and universities, students' psychological characteristics,
learning needs, and content must be compared and analyzed. Students' emotional needs
in learning Chinese painting are the key part. Through online learning methods, including
micro-video learning and collaborative learning online discussion, students can have
emotional expression and accumulation. Teachers can leave questions and role-playing
activities after online courses so that students can summarize and reflect on themselves
and maintain emotional continuity. Finally, the results exchange, debate, and discussion
are conducted in offline classes. The primary purpose is to facilitate the generation
and expression of new emotions and evaluate the learning content. This part runs through
the whole learning process instead of being placed only in the third part. Learning
emotional Chinese painting courses requires students to master the basic knowledge
of Chinese painting. It cultivates students’ understanding of the traditional culture
contained in Chinese painting and enhances their emotional identification with Chinese
traditional culture [19]. This model mainly uses qualitative research and quantitative research methods to
conduct a fine-grained analysis of students’ reflective experiences by evaluating
the students’ reflection after class. Fig. 4 shows the flow of the blended learning mode integrating deep learning.
The teaching process is not a one-way process but a cyclic process, as shown in Fig. 4. Through reflection and summary, teachers can analyze students’ emotions reflected
by students’ movements and expressions during class through the CNN model and adjust
teaching methods and content in time. In this process, the function of the weight
offset module in CNN is to pre-set teaching objectives, in which CNN focuses on monitoring
the actions and expressions related to the target and assigns higher weights to related
images and videos, which is more accurate. Finally, whether students meet the established
learning goals in the classroom is determined.
Fig. 3. The mixed learning mode of Chinese painting in colleges and universities.}
Fig. 4. Blended teaching model integrating deep learning.}