3.1 Overall Design of Student Motion System and Data Preprocessing Optimization Algorithm
The method of association rules is an essential part of data mining, meaning there
is a certain relationship between one or several groups of data in the database. PSO
is a widely used algorithm derived from bird predation. Combining the two, using the
ability of GSA-PSO to converge to optimize the Apriori algorithm, a data mining algorithm
for association rules based on PSO was formed, and the optimal association rules were
obtained [16,17]. The main purpose of the aerobics exercise data analysis system is to achieve rapid
convergence and data mining analysis of student exercise data. Fig. 1 shows the overall architecture.
The system in Fig. 1 comprises four parts: data collection, data conversion, hybrid gravity particle swarm
algorithm operation module, and the visual interface of the university teaching system,
which together constitute the construction of a sports database system. In the overall
process, GSA is used to optimize the PSO. The two were combined into a hybrid GSA-PSO
algorithm for rule extraction. First, the algorithm assumes that there are a total
number of particles in a gravitational system $N$, and the position of particle $i$
is expressed as formula (1).
where ${x}_{i}^{n}$ represents the location of particles in the third dimension space
$i$ within the scope of the region. Owing to the influence of gravitation, the particle
force $i$ on the particle $d$ in the dimension $t$ can be expressed as formula (2).
Fig. 1. Overall architecture of the system.
where ${F}_{ij}^{d}\left(t\right)$ indicates the force of particles; $M_{i},M_{j}$
represent the gravitational mass of different particles; $\varepsilon $ is a small
constant; $G\left(t\right)$ expresses the gravitational constant at a specific time
$t$, as the universe time is increased; $G\left(t\right)$ is the gravitational constant
at time $t$. The older the universe, the smaller the value of $G\left(t\right)$; $t$
is different specific times; $R_{ij\left(t\right)}$ is the Euclidean distance between
different particles at different specific times. The specific expression is shown
in the formula (3).
where $G_{0}$ means the initial gravitational constant of the universe, which is generally
assigned a value of 1 or 100; $\alpha $ is a constant and is generally assigned a
value of 20 or 23; $T$ is the maximum number of iterations. In the actual operation,
for the characteristics of randomness integrated into GSA, a random number $d$ is
generally assumed in the dimensional space $rand$. The acceleration expression of
the particle at any time in space and the basic definition of the gravitational mass
can be obtained based on the law of motion. The details are in the following formula
(4).
where $a_{i}\left(t\right)$ indicates the acceleration; $m_{i}\left(t\right)$ represents
the gravitational mass equal to the value of the inertial mass $F_{i}\left(t\right)$;
$f_{i}\left(t\right)$ means the particle fitness function value at a specific time
$t$; $f_{worst}\left(t\right)$ indicates the worst fitness function value of the group
at a specific time $t$. The expression of mass is expressed as formula (5).
The positions and velocities of different particles can be calculated by combining
the above formulae, as expressed as formula (6).
where $v_{id}\left(t+1\right)$ indicates the velocity of the particle at the next
time; $randv_{id}\left(t\right)$ refers to the current velocity of the particle; $a_{id}\left(t\right)$
denotes the velocity change or acceleration; $x_{id}\left(t+1\right)$ expresses the
position, which is the same as the speed. GSA and PSO are combined to form a hybrid
GSA-PSO algorithm to have global search ability and particle information sharing ability
simultaneously. formula (7) is the specific expression.
where $c_{1},c_{2}$ represent the learning factor; $r_{1},r_{2},r_{3}$ are any number
in the interval $\left[0,1\right]$; $a_{id}\left(t\right)$ denotes the acceleration
in the GSA algorithm. The gravitational force of the particle and the ability to exchange
global information can reach a state of balance during the optimization by adjusting
the value of the learning factor. From the above formula, the acceleration of particles
is related to the gravitational algorithm, and the update of the particle speed is
related to the local optimal and global optimal positions. All particles within the
range have mutual attraction and can perceive and approach the global optimal position
[18]. The GSA-PSO algorithm is used to optimize and solve the discrete data problem. Fig. 2 presents the method diagram.
Fig. 2. GSA-PSO algorithm optimization diagram for discrete data.
Fig. 2 shows four original data: T1, T2, T3, and T4. These original data are converted to
B1, B2, B3, and B4 under the optimization of the GSA-PSO algorithm. That is, assuming
that there are four products, for transaction B1, only the first and third products
need to be purchased.
3.2 Optimization Method of Student Movement Data Mining based on the GSA-PSO Algorithm
Owing to the large amount of data on college students’ aerobics exercises, and the
Apriori algorithm is relatively inefficient for data mining in large-scale databases,
the calculation occupies a large amount of computer memory space. The research combines
the improved the GSA-PSO algorithm with the Apriori algorithm again. It uses the ability
of the improved GSA-PSO to converge quickly to make up for the deficiency of the Apriori
algorithm to realize the extraction of optimal association rules [19,20]. The new algorithm can be used to extract rules to reduce the computing time and
generate redundant rules. Fig. 3 shows the data mining model combined with the algorithm.
Fig. 3. Association rule data mining model under the GSA-PSO algorithm.
The fitness of the individual was evaluated using the PSO objective function as the
guidance function to carry out the particle search in the region. In addition, the
quantitative evaluation of each individual was set with the fitness function as the
evaluation function, which has an impact on the execution efficiency of the algorithm
and the quality of the data mining results. The Apriori algorithm was the first association
rule mining algorithm and the most classic one. Association rules consume relatively
high time and space complexity. Its primary data mining process is divided into two
stages: find all high-frequency items from the data collection and find the association
rules from these items with higher frequency. Confidence and support are the two most
basic and important criteria when mining association rules [21]. The expression of the support degree is expressed as formula (8).
where $S\left(A\rightarrow B\right)$ indicates the support degree; $A,B$ are the transactions;
$P\left(A\cup B\right)$ is the proportion of the number of transactions in the entire
database. The expression of the degree of confidence is shown in the following formula
(9).
where $C\left(A\rightarrow B\right)$ represents the confidence, and its internal meaning
is the proportion of all events that contain transactions simultaneously. $B$constructs
the fitness function using confidence and support. It combines the two and multiplies
the corresponding impact factors, sums the results, and uses the obtained sum as the
fitness value. Assuming there are particles $x$, the fitness function is expressed
as formula (10).
where $C\left(x\right)$ stands for the degree of confidence; $S\left(x\right)$ is
the degree of support; $a,b$ are the relevant parameters of the support and confidence
in the fitness function, all of which meet the range of the region $\left[0,1\right]$,
and the sum of the two parameters is equal to 1. When the support parameter is 0,
the association rule contains confidence, and the algorithm, in this case, may fall
into rules with high confidence and low support. Similarly, the association rules
only contain support when the confidence parameter is 0. In this case, the association
rules may miss the vortex of rules with low support and high confidence, and these
rules generally have a higher value for the actual situation. In the PSO algorithm,
different particles have different velocities, positions, and fitness values at different
times [22,23]. Assuming there are particles $M$, the speed and position of the particles are updated,
as shown in the following formula (11).
where the velocity of a particle at a specific time $t$ is expressed as $v_{id}$;
the position is marked as $pbest$; the individual optimal position is labeled as $pbest_{id}$;
the group optimal position is denoted as $gbest_{id}$. PSO operation is performed,
and the following formula (12) is details.
The above formula performs the particle swarm binary update, as shown in Eq. (13).
where $i$ means the size of the population, and the value range is $\left[1,N\right]$;
$sig\left(x\right)$ is the function, and the value range is $\left(0,1\right)$; $d$
expresses the particle space dimension $1,2,\cdots ,D$, and the parameter is $r$;
this is because the value range of $sig\left(x\right)$ function on the interval is
$\left(0,1\right)$. The speed range of the particle is then satisfied when there is
an independent variable $x$, $\left[-10,10\right]$; that is, the maximum speed of
the particle is always less than 10. The particle speed in iterative movement increased
as the solution approached the local optimal solution. Fig. 4 shows the flow chart of association rule data mining under the GSA-PSO algorithm.
Fig. 4. Flow chart of the association rule data mining under the GSA-PSO algorithm.