2.1 Machine Learning in Educational Environments
Machine learning, pedagogy, and psychology are integrated in EDM [9], which has a vast array of practical uses within educational environments. Machine
learning is mainly applied in the following aspects.
(1) Student modeling [10]: it is used to represent students' status information. Student behavior, performance,
and other attributes are mined to understand individual student differences.
(2) Learning recommendations: personalized recommendations of learning resources,
elective courses, etc. are made based on their interests, characteristics, etc. [11].
(3) Analysis and visualization: visualization techniques are used to visualize the
data in order to facilitate the interpretation of the meaning embedded in the educational
data [12].
(4) Achievement prediction: Based on students' scores, behaviors, and other data,
we predict students' test scores, course completion, dropout, graduation, and so on.
The prediction of students' exam results is useful for: (1) students: it can help
students understand the differences between themselves and students with better scores
so that they can improve themselves in time; (2) teachers: it can help teachers make
predictions about students' ability to pass exams and adjust their teaching plans;
(3) administrators: it can assist school administrators in recognizing students who
are at risk of failing or delaying their graduation early and take effective measures
to intervene, thus improving school management.
Score prediction is a crucial element in EDM, and many machine-learning methods have
been applied in this area. In order to further understand the efficacy of different
machine learning methods in score prediction, this article took the prediction of
English exam scores as an example, analyzed the behavioral characteristics of students,
and used different machine learning methods to make predictions.
2.2 Student Behavior Characteristics
English learning is an essential element in university study, no matter what the major
is, so this paper focuses on predicting students' English exam scores. The study subjects
were 721 students who entered the School of Foreign Language and Literature of Xi'an
Fanyi University in 2021. The final exam scores of these students achieved in the
course "College English" in the first semester of the academic year 2021 were studied.
The data for the study were obtained from the following systems:
(1) teaching management system, which contains students' basic information, course
selection, class scheduling, exam results, etc.;
(2) campus card data management system, which contains data on daily consumption,
library access control, and book borrowing by students using campus cards.
Table Table 1 displays the student behavioral characteristics selected for experimental analysis.
Table 1. Characteristics of student behavior.
Feature
|
Description
|
Number of absences
|
The number of absences of students in the course of "College English"
|
The average score on usual tests
|
The average score on the usual accompanying tests
|
Number of library entries
|
The number of times students swipe their cards to enter the library according to the
campus card data
|
Average monthly book lending volume
|
Average monthly number of books checked out by students in the library based on the
campus card data
|
Student monthly consumption amount
|
Average monthly student spending according to the campus card data
|
The prediction of the final ``College English'' exam score is considered a classification
problem. The exam scores were classified according to Table Table 2.
Table 2. Classification of final examination results.
Score
|
Category
|
80-100
|
0
|
60-80
|
1
|
0-60
|
2
|
For prediction purposes, the data on students' behavioral characteristics were normalized
After normalization, the data is mapped in $[0$, $1]$.
The characteristics in Table Table 1 were analyzed. First, the relationship between the number of student absences and
their English final exam scores is shown in Fig. 1.
Fig. 1. Relationship between the number of student absences and their English final
exam scores.
According to Fig. 1, the percentage of less than three absences was the highest among students with final
exam scores of 80-100 in English, reaching over 80%, and there were no students with
more than seven absences, while the percentage of more than seven absences was significantly
higher among students with scores of 0-60 than the other two categories. In university
courses, students' attendance is linked to their regular scores, so the higher the
number of absences, the higher the likelihood that students will have a lower final
exam score.
The relationship between the average score of usual tests and final scores is shown
in Fig. 2.
Fig. 2. Relationship between the average score of usual tests and the final English
exam score.
In Fig. 2, more than 80% of the students who scored 80-100 points on the English final exam
also had an average score of usual tests between 80 points and 100 points, and there
were no students with an average score on usual tests between 0 point and 60 points;
the majority of students who scored 0-60 points on the final exam also scored 0-60
points in usual quizzes. The usual quizzes are a reflection of students' English learning
ability, and the better their usual mastery, the better their performance in the final
exam.
The relationship of the number of library entries and the book lending volume with
final scores is shown in Figs. 3 and 4.
Fig. 3. Relationship between the number of library entries and English final exam
scores.
According to Fig. 3, students who scored above 60 points in the English final exam entered the library.
Among them, the percentage of students who entered the library less than three times
was the lowest among students who scored 80-100 points, and the percentage of students
who entered the library more than 25 times was the highest; however, for students
who scored below 60 points, more than 70% of them entered the library less than three
times in a semester, and none of them entered the library more than 25 times.
Fig. 4. Relationship between average monthly book lending volume and English final
exam scores.
According to Fig. 4, among students who scored 80-100 points in the English final exam, the number of
students who borrowed more than 30 books was the highest, accounting for more than
40%; among students who scored 60-80 points, the number of students with different
lending volumes did not vary much; among students who scored less than 60 points,
more than 70% of the students borrowed less than five books. Based on Figs. 3 and 4, it was found that the library is important for students to find materials and improve
their abilities; the more frequently they enter the library and the more they borrow,
the higher the likelihood that they perform better in the final exam.
Finally, the relationship between students' monthly consumption amounts and their
final exam scores is shown in Fig. 5.
Fig. 5. Relationship between students' monthly consumption amounts and their English
final exam scores.
In Fig. 5, there was not much difference in the monthly consumption amount on the campus card
for different categories of students. The percentage of students with a monthly consumption
amount above 1000 yuan was relatively small, and the number of students with a monthly
consumption amount in the range of 500-1000 yuan was relatively large. In order to
further determine the relationship between different characteristics and final exam
scores, the correlation coefficient [13] between different characteristics and final exam scores was calculated
The value of $r$ is in $[-1$, $1]$; the larger the absolute value, the stronger the
correlation. The calculated results are presented in Table Table 3.
Table 3. Feature correlation analysis.
Feature
|
Value of $\pmb{r}$
|
Number of absences
|
-0.345
|
The average score of usual tests
|
0.372
|
Number of library entries
|
0.287
|
Average monthly book lending volume
|
0.291
|
Monthly consumption amount
|
0.002
|
It was observed in Table Table 3 that the average score of usual tests had the highest value of $r$, which reached
0.372, the final exam score showed a negative correlation with the number of absences
with a $r$ value of -0.345, the $r$ value of the number of library entries and the
average monthly book lending volume was 0.287 and 0.291, respectively, and the $r$
value of students' monthly consumption amount was the smallest, only 0.002. These
results indicated that the correlation between this feature and students' English
final exam scores was small, so it was excluded.
2.3 Performance Prediction Method
Several different machine learning methods were analyzed for predicting students'
English final exam scores.
(1) Clustering algorithm
K-medoids clustering is an improvement of the K-means algorithm [14], with low time complexity. In the student behavior characteristics data, $k$ samples
were randomly chosen as the initial cluster centers. Then, the distance from each
sample to the center point is computed, and the samples are grouped according to the
minimum distance value. The absolute error of the point with the smallest distance
from the sample point in every category was taken as the new center point. The center
point was continuously updated until it no longer changed. Finally, the algorithm
ended, and the result was output.
(2) Support vector regression
Support vector regression (SVR) has a good advantage in solving nonlinear, high-dimensional
data [15]. It maps the original problem to a high-dimensional space by means of a kernel function:$f(x)=w\cdot
\varphi (x)+b$, where $w$ means a weight vector, $b$ denotes a deviation, and $\varphi
(x)$ denotes a kernel function. For data set $\{x_i$, $y_i\}$, the regression problem
can be described as
where ${\zeta }_i$ and ${\zeta }^*_i$ are the relaxation variables, $C$ is the penalty
factor, and $\varepsilon $ is the precision.
According to the Karush-Kuhn-Tucker (KKT) condition [16], the regression function is obtained: $f(x)=\sum^N_{i=1}{({\alpha }_i-{\alpha }^*_i)x_ix_j+b}$,
where ${\alpha }_i$ and ${\alpha }^*_i$ are Lagrange multipliers.
(3) Random forest
The RF algorithm is an enhanced version of the decision tree algorithm [17]. For training set $D=\left(x_i,y_i\right)$, the Bagging method is used to draw $n$
samples with replacement, the classification and regression tree (CART) algorithm
[18] is used to generate a decision tree. $k$ features are randomly selected to repeat
the operation to obtain $m$ decision trees to establish an RF. Then, the sample categories
are confirmed by voting.
(4) Gradient boosting decision tree
This method combines decision tree and boosting methods and has high accuracy in data
processing [19], and its algorithm flow is as follows. The weak learner is initialized: $f_0(x)=arg{\mathop{\mathrm{min}}_{c}
\sum^N_{i=1}{L(y_i,c)}}$. Then, the model is iterated $m$ times. The residual of each
sample $(x_i,y_i)$ is: $r_{m,j}=-{\left[\frac{\vartheta L(y_i,f(x_i))}{\vartheta f(x_i)}\right]}_{f(x)-f_{m-1}(x)}$.
Then, a CART is fitted using $(x_i,r_{mi})$. After $m$ times of iterations, $m$ CARTs
($T_m$) are obtained. For each leaf node $R_{mj}$, the best fit value is calculated:
$c_{ij}=arg{\mathop{\mathrm{min}}_{c} \sum_{x_i\in R_{mj}}{L(y_i,f_{m-1}(x_i)+c)}}$.
The strong learner is updated:$f_m(x)=f_{m-1}(x)+\sum^J_{j=1}{c_{mj}I(x\in R_{mj})}$.
The final strong learner is: $f(x)=\sum^M_{m=1}{\sum^J_{j=1}{c_{mj}I(x\in R_{mj})}}$.
(5) XGBoost
The XGBoost algorithm is an improved algorithm based on GBDT [20], which has low complexity and also prevents overfitting [21]. For a given dataset $D=(x_i,y_i)$, a set of $k$ classification threes is obtained
by training. For sample $x_i$, its prediction result is: ${\hat{y}}_i=\sum^k_{k=1}{f_k(x_i)}$,
$f_k\in F$, where $F$ is the set of classification trees. The objective function is
written as
where $\sum^n_{i=1}{l({\hat{y}}_i,y_i)}$ stands for the loss function of the model,
$\sum^K_{k=1}{\gamma T+\frac{1}{2}\lambda \int^T_{j=1}{w^2_j}}$ stands for the regularization
term, $\gamma $ stands for the penalty term, $T$ stands for the count of leaf nodes,
and $\lambda $ stands for the smoothing coefficient.