Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 14, No. 03, p.331-338

ISSN (online) :

2287-5255

Received : 6 March 2024Revised : 25 April 2024Accepted : 4 June 2024

DOI :

https://doi.org/10.5573/IEIESPC.2025.14.3.331

Regular Paper

Machine Learning for Predicting Students’ English Test Scores in an Educational Setting

YangShufang¹^*

( Department of International Policing, Hubei University of Police, Wuhan, Hubei 430000, China ysffang@outlook.com)

^* Corresponding Author: Shufang Yang, ysffang@outlook.com

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

Machine learning methods have been increasingly used in educational settings. For the problem of predicting students' English test scores, first, the number of absences and the average score of usual tests were selected as students' behavioral features; then, the prediction effects of five machine learning approaches, including K-medoids, support vector regression (SVR), random forest (RF), gradient boosting decision tree (GBDT) and XGBoost, were compared after eliminating the features with weak correlation. The results showed that the XGBoost method best predicted students' English test scores, with an average accuracy, precision, and recall rate of 0.8619, 0.8322, and 0.8734, respectively, and an F1 value of 0.8523. The findings demonstrate the reliability of the XGBoost method in predicting students' English test scores. It can be extended and applied in practice. This article makes some contributions to the accurate prediction of students' grades and the intelligent management of educational information.

Keywords

Education, Machine learning, English test, Score prediction, XGBoost

1. Introduction

Technology advancements have brought significant changes to the educational environment, and more and more information is stored in electronic form in educational databases ^[1] to facilitate teaching and learning activities. In this case, the information accumulated in the education database often contains a lot of valuable information, which can support teaching management. In order to achieve efficient utilization of educational data, the development of educational data mining (EDM) has emerged. EDM enables services to be provided to teachers, students, and school administrators to enhance the quality of education, teaching, etc. In the educational environment, machine learning can be applied to evaluate teachers' teaching quality, predict students' performance, and analyze students' course selection situations. Student performance prediction refers to building models based on the information about students and using machine learning and other techniques to predict students' performance over time. Predicting students' test scores helps students identify their problems and correct them ^[3] and helps teachers forecast students' performance and thus make adjustments to their course plans. With the advancement of technology, an increasing number of techniques have been utilized in forecasting students' examination results ^[4]. Poudyal et al. ^[5] forecasted students' academic achievements using a designed hybrid 2D convolutional neural network (CNN) and found through experiments that the accuracy of the hybrid model was superior to the conventional baseline model. Trindade et al. ^[6] used teacher behavior data to forecast student performance using a random forest (RF) classification algorithm. They found that the algorithm was effective in predicting student academic performance through a case study. Baruah et al. ^[7] developed a deep learning classifier to achieve student performance prediction and found that the method had a mean square error of only 0.3383. Ranjeeth et al. ^[8] combined a multilayer perceptron and stochastic gradient descent classifier to forecast students' academic performance. The maximum accuracy of the method reached 79.3%. However, current research mostly focuses on studying a single machine-learning method, with very few comparative analyses conducted among multiple machine-learning methods in the same experimental environment. In order to further understand the performance of different methods in predicting performance, this paper compared the performance of several machine learning approaches in forecasting students' English test scores in order to provide a more suitable method. This work offers theoretical backing to enhance the utilization of machine learning techniques in educational settings.

2. Methods for Predicting Students' English Test Scores

2.1 Machine Learning in Educational Environments

Machine learning, pedagogy, and psychology are integrated in EDM ^[9], which has a vast array of practical uses within educational environments. Machine learning is mainly applied in the following aspects.

(1) Student modeling ^[10]: it is used to represent students' status information. Student behavior, performance, and other attributes are mined to understand individual student differences.

(2) Learning recommendations: personalized recommendations of learning resources, elective courses, etc. are made based on their interests, characteristics, etc. ^[11].

(3) Analysis and visualization: visualization techniques are used to visualize the data in order to facilitate the interpretation of the meaning embedded in the educational data ^[12].

(4) Achievement prediction: Based on students' scores, behaviors, and other data, we predict students' test scores, course completion, dropout, graduation, and so on. The prediction of students' exam results is useful for: (1) students: it can help students understand the differences between themselves and students with better scores so that they can improve themselves in time; (2) teachers: it can help teachers make predictions about students' ability to pass exams and adjust their teaching plans; (3) administrators: it can assist school administrators in recognizing students who are at risk of failing or delaying their graduation early and take effective measures to intervene, thus improving school management.

Score prediction is a crucial element in EDM, and many machine-learning methods have been applied in this area. In order to further understand the efficacy of different machine learning methods in score prediction, this article took the prediction of English exam scores as an example, analyzed the behavioral characteristics of students, and used different machine learning methods to make predictions.

2.2 Student Behavior Characteristics

English learning is an essential element in university study, no matter what the major is, so this paper focuses on predicting students' English exam scores. The study subjects were 721 students who entered the School of Foreign Language and Literature of Xi'an Fanyi University in 2021. The final exam scores of these students achieved in the course "College English" in the first semester of the academic year 2021 were studied. The data for the study were obtained from the following systems:

(1) teaching management system, which contains students' basic information, course selection, class scheduling, exam results, etc.;

(2) campus card data management system, which contains data on daily consumption, library access control, and book borrowing by students using campus cards.

Table Table 1 displays the student behavioral characteristics selected for experimental analysis.

Table 1. Characteristics of student behavior.

Feature	Description
Number of absences	The number of absences of students in the course of "College English"
The average score on usual tests	The average score on the usual accompanying tests
Number of library entries	The number of times students swipe their cards to enter the library according to the campus card data
Average monthly book lending volume	Average monthly number of books checked out by students in the library based on the campus card data
Student monthly consumption amount	Average monthly student spending according to the campus card data

The prediction of the final ``College English'' exam score is considered a classification problem. The exam scores were classified according to Table Table 2.

Table 2. Classification of final examination results.

Score	Category
80-100	0
60-80	1
0-60	2

For prediction purposes, the data on students' behavioral characteristics were normalized

(1)

$ x'=\frac{x-x_{\min}}{x_{\max}-x_{\min}}. $

After normalization, the data is mapped in $[0$, $1]$.

The characteristics in Table Table 1 were analyzed. First, the relationship between the number of student absences and their English final exam scores is shown in Fig. 1.

Fig. 1. Relationship between the number of student absences and their English final exam scores.

According to Fig. 1, the percentage of less than three absences was the highest among students with final exam scores of 80-100 in English, reaching over 80%, and there were no students with more than seven absences, while the percentage of more than seven absences was significantly higher among students with scores of 0-60 than the other two categories. In university courses, students' attendance is linked to their regular scores, so the higher the number of absences, the higher the likelihood that students will have a lower final exam score.

The relationship between the average score of usual tests and final scores is shown in Fig. 2.

Fig. 2. Relationship between the average score of usual tests and the final English exam score.

In Fig. 2, more than 80% of the students who scored 80-100 points on the English final exam also had an average score of usual tests between 80 points and 100 points, and there were no students with an average score on usual tests between 0 point and 60 points; the majority of students who scored 0-60 points on the final exam also scored 0-60 points in usual quizzes. The usual quizzes are a reflection of students' English learning ability, and the better their usual mastery, the better their performance in the final exam.

The relationship of the number of library entries and the book lending volume with final scores is shown in Figs. 3 and 4.

Fig. 3. Relationship between the number of library entries and English final exam scores.

According to Fig. 3, students who scored above 60 points in the English final exam entered the library. Among them, the percentage of students who entered the library less than three times was the lowest among students who scored 80-100 points, and the percentage of students who entered the library more than 25 times was the highest; however, for students who scored below 60 points, more than 70% of them entered the library less than three times in a semester, and none of them entered the library more than 25 times.

Fig. 4. Relationship between average monthly book lending volume and English final exam scores.

According to Fig. 4, among students who scored 80-100 points in the English final exam, the number of students who borrowed more than 30 books was the highest, accounting for more than 40%; among students who scored 60-80 points, the number of students with different lending volumes did not vary much; among students who scored less than 60 points, more than 70% of the students borrowed less than five books. Based on Figs. 3 and 4, it was found that the library is important for students to find materials and improve their abilities; the more frequently they enter the library and the more they borrow, the higher the likelihood that they perform better in the final exam.

Finally, the relationship between students' monthly consumption amounts and their final exam scores is shown in Fig. 5.

Fig. 5. Relationship between students' monthly consumption amounts and their English final exam scores.

In Fig. 5, there was not much difference in the monthly consumption amount on the campus card for different categories of students. The percentage of students with a monthly consumption amount above 1000 yuan was relatively small, and the number of students with a monthly consumption amount in the range of 500-1000 yuan was relatively large. In order to further determine the relationship between different characteristics and final exam scores, the correlation coefficient ^[13] between different characteristics and final exam scores was calculated

(2)

$ r=\frac{\sum^{n}_{i=1}({x}_{i}- \overline{x})({y}_{i} - \overline{y})}{\sqrt{\sum^{n}_{i=1}({x}_{i} - \overline{x})^{2}}\sqrt{\sum^{n}_{i=1}({y}_{i}-\overline{y})^{2}}}. $

The value of $r$ is in $[-1$, $1]$; the larger the absolute value, the stronger the correlation. The calculated results are presented in Table Table 3.

Table 3. Feature correlation analysis.

Feature	Value of $\pmb{r}$
Number of absences	-0.345
The average score of usual tests	0.372
Number of library entries	0.287
Average monthly book lending volume	0.291
Monthly consumption amount	0.002

It was observed in Table Table 3 that the average score of usual tests had the highest value of $r$, which reached 0.372, the final exam score showed a negative correlation with the number of absences with a $r$ value of -0.345, the $r$ value of the number of library entries and the average monthly book lending volume was 0.287 and 0.291, respectively, and the $r$ value of students' monthly consumption amount was the smallest, only 0.002. These results indicated that the correlation between this feature and students' English final exam scores was small, so it was excluded.

2.3 Performance Prediction Method

Several different machine learning methods were analyzed for predicting students' English final exam scores.

(1) Clustering algorithm

K-medoids clustering is an improvement of the K-means algorithm ^[14], with low time complexity. In the student behavior characteristics data, $k$ samples were randomly chosen as the initial cluster centers. Then, the distance from each sample to the center point is computed, and the samples are grouped according to the minimum distance value. The absolute error of the point with the smallest distance from the sample point in every category was taken as the new center point. The center point was continuously updated until it no longer changed. Finally, the algorithm ended, and the result was output.

(2) Support vector regression

Support vector regression (SVR) has a good advantage in solving nonlinear, high-dimensional data ^[15]. It maps the original problem to a high-dimensional space by means of a kernel function:$f(x)=w\cdot \varphi (x)+b$, where $w$ means a weight vector, $b$ denotes a deviation, and $\varphi (x)$ denotes a kernel function. For data set $\{x_i$, $y_i\}$, the regression problem can be described as

(3)

$ \min\left\{\frac{1}{2}{\left\|w\right\|}^2+C\cdot \sum^n_{i=1}{\left({\zeta }_i+{\zeta }^*_i\right)}\right\}, \\ $

(4)

$ \text{s.t.} \quad f\left(x_i\right)-y_i\le \varepsilon +{\zeta }^*_i, \\ $

(5)

$ \phantom{\text{s.t.}} \quad y_i-f\left(x_i\right)\le \varepsilon +{\zeta }^*_i, $

where ${\zeta }_i$ and ${\zeta }^*_i$ are the relaxation variables, $C$ is the penalty factor, and $\varepsilon $ is the precision.

According to the Karush-Kuhn-Tucker (KKT) condition ^[16], the regression function is obtained: $f(x)=\sum^N_{i=1}{({\alpha }_i-{\alpha }^*_i)x_ix_j+b}$, where ${\alpha }_i$ and ${\alpha }^*_i$ are Lagrange multipliers.

(3) Random forest

The RF algorithm is an enhanced version of the decision tree algorithm ^[17]. For training set $D=\left(x_i,y_i\right)$, the Bagging method is used to draw $n$ samples with replacement, the classification and regression tree (CART) algorithm ^[18] is used to generate a decision tree. $k$ features are randomly selected to repeat the operation to obtain $m$ decision trees to establish an RF. Then, the sample categories are confirmed by voting.

(4) Gradient boosting decision tree

This method combines decision tree and boosting methods and has high accuracy in data processing ^[19], and its algorithm flow is as follows. The weak learner is initialized: $f_0(x)=arg{\mathop{\mathrm{min}}_{c} \sum^N_{i=1}{L(y_i,c)}}$. Then, the model is iterated $m$ times. The residual of each sample $(x_i,y_i)$ is: $r_{m,j}=-{\left[\frac{\vartheta L(y_i,f(x_i))}{\vartheta f(x_i)}\right]}_{f(x)-f_{m-1}(x)}$. Then, a CART is fitted using $(x_i,r_{mi})$. After $m$ times of iterations, $m$ CARTs ($T_m$) are obtained. For each leaf node $R_{mj}$, the best fit value is calculated: $c_{ij}=arg{\mathop{\mathrm{min}}_{c} \sum_{x_i\in R_{mj}}{L(y_i,f_{m-1}(x_i)+c)}}$. The strong learner is updated:$f_m(x)=f_{m-1}(x)+\sum^J_{j=1}{c_{mj}I(x\in R_{mj})}$. The final strong learner is: $f(x)=\sum^M_{m=1}{\sum^J_{j=1}{c_{mj}I(x\in R_{mj})}}$.

(5) XGBoost

The XGBoost algorithm is an improved algorithm based on GBDT ^[20], which has low complexity and also prevents overfitting ^[21]. For a given dataset $D=(x_i,y_i)$, a set of $k$ classification threes is obtained by training. For sample $x_i$, its prediction result is: ${\hat{y}}_i=\sum^k_{k=1}{f_k(x_i)}$, $f_k\in F$, where $F$ is the set of classification trees. The objective function is written as

(6)

$ {Obj}^{(t)}=\sum^n_{i=1}{l({\hat{y}}_i,y_i)+\sum^K_{k=1}{\gamma T+\frac{1}{2}\lambda \sum^T_{j=1}{w^2_j}}}, $

where $\sum^n_{i=1}{l({\hat{y}}_i,y_i)}$ stands for the loss function of the model, $\sum^K_{k=1}{\gamma T+\frac{1}{2}\lambda \int^T_{j=1}{w^2_j}}$ stands for the regularization term, $\gamma $ stands for the penalty term, $T$ stands for the count of leaf nodes, and $\lambda $ stands for the smoothing coefficient.

3. Results and Analysis

The four features obtained in the second section that strongly correlate with students' English final exam scores were used as the input of the machine learning algorithm, and the outputs were the three categories classified in Table 2. The model was established through grid search ^[22]. The experiments were conducted using the five-fold cross-check method, and the final result was determined by calculating the average value of five experiments. The algorithm was assessed based on a confusion matrix (Table Table 4).

Table 4. Confusion matrix.

		Actual category
		P	N
Prediction category	P	TP	FP
Prediction category	N	FN	TN

(1) Accuracy: the ratio of samples correctly forecasted to the overall sample count, $Accuracy=\frac{TP+TN}{TP+FN+FP+TN}$.

(2) Precision: the ratio of positive samples among the samples that are forecasted as positive, $Precision=\frac{TP}{TP+FP}$.

(3) Recall rate: the proportion of samples that are forecasted as positive among actually positive samples, $Recall=\frac{TP}{TP+FN}$.

(4) F1 value: a combined consideration of precision and recall rate, $F1=\frac{2\times Precision\times Recall}{Precision+Recall}$.

Table Table 5 shows the accuracy of the different machine learning methods in predicting students' English final exam scores.

Table 5. Accuracy comparison among various methods.

	K-medoids	SVR	RF	GBDT	XGBoost
Category 0	0.7123	0.7545	0.7821	0.8234	0.8569
Category 1	0.7564	0.7768	0.8011	0.8455	0.8852
Category 2	0.7036	0.7462	0.7807	0.8196	0.8437
Average value	0.7241	0.7592	0.7880	0.8295	0.8619

The prediction of different categories found that the accuracy of these methods for category 1 was higher than that for category 0 and category 2, which may be because the higher number of students in category 1 resulted in more adequate training of the algorithm. The average accuracy of the K-medoids method was the lowest, 0.7241; that of SVR and RF methods were below 0.8; that of the GBDT method reached an average accuracy of 0.8295, and that of the XGBoost method reached 0.8619, which was 0.0324 higher than the GBDT method. These findings revealed that the XGBoost method had the highest accuracy in predicting students' English final exam scores among these machine learning methods.

Table Table 6 compares the precision and recall rate between different machine learning approaches.

Table 6. Comparison of the precision and recall rate between different methods.

Precision	K-medoids	SVR	RF	GBDT	XGBoost
Category 0	0.6984	0.7321	0.7525	0.7984	0.8236
Category 1	0.7125	0.7456	0.7765	0.8122	0.8452
Category 2	0.6832	0.7268	0.7607	0.7856	0.8277
Average value	0.6980	0.7348	0.7632	0.7987	0.8322
Recall rate	K-medoids	SVR	RF	GBDT	XGBoost
Category 0	0.7564	0.7612	0.7864	0.8236	0.8677
Category 1	0.7736	0.7745	0.7932	0.8564	0.8894
Category 2	0.7629	0.7654	0.7797	0.8232	0.8631
Average value	0.7643	0.7670	0.7864	0.8344	0.8734

According to Table Table 6, category 1 had slightly greater precision and recall rate than categories 0 and 2. Then, in the comparison of different methods, the XGBoost method had the highest average precision, 0.8322, which was 0.1342, 0.0974, 0.069, and 0.0335 higher than K-medoids, SVR, RF, and GBDT methods, respectively. This result indicated that the XGBoost method also performed best in precision. The average recall rate of the XGBoost method reached 0.8734, the highest among these methods.

A comparison of the F1 value between the different methods for predicting students' English final exam scores is displayed in Fig. 6.

Fig. 6. Comparison of the F1 value between different methods.

According to Fig. 6, the XGBoost method had the best performance in predicting students' final English exam results. The F1 value of K-medoids, SVR, and RF methods were low, 0.7296, 0.7506, and 0.7746, respectively, while the F1 value of the GBDT method was above 0.8. The F1 value of the GBDT method for category 1 reached 0.8337, which was significantly superior to the former methods. As a further improvement of the GBDT method, the highest and lowest F1 values of the XGBoost method reached 0.8667 and 0.8450, respectively, which were higher than the highest value of the GBDT methods. The average F1 value of the XGBoost method was 0.8523, which was 0.0361 higher than that of the GBDT method. In conclusion, the XGBoost method had the highest accuracy and F1 value among these methods for predicting students' final exam scores.

Finally, based on the XGBoost method, the importance of the features related to students' English final exam scores was analyzed, and the outcomes are presented in Fig. 7.

Fig. 7. Feature importance analysis.

It was found from Fig. 7 that among the four features selected, the average score of usual tests had the greatest influence on students' English final exam scores, accounting for 36%, followed by the number of absences, accounting for 32%, and the monthly book lending volume had a greater influence than the number of library entries. According to Fig. 7, to get a better score on the English final exam, students should focus on the usual quizzes, ensure regular attendance, and enrich themselves more in the library.

4. Conclusion

According to the current educational environment, this paper adopted several machine learning methods to study the prediction of students' English final scores. Some indicators, such as the number of absences and the average score of usual tests, were selected as features to compare the performance of K-medoids, SVR, RF, GBDT, and XGBoost methods in predicting scores. The results found that the XGBoost method performed best, with an accuracy of 0.8619 and an F1 value of 0.8523. Among the selected features, the average score of usual tests had the greatest impact on students' final English scores, so students should pay attention to each test in the actual English learning process.

REFERENCES

T. R. Kumar, T. Vamsidhar, B. Harika, T. M. Kumar, and R. Nissy, ``Students performance prediction using data mining techniques,'' Proc. of 2019 International Conference on Intelligent Sustainable Systems (ICISS), pp. 407-411, 2019.

H. E. Abdelkader, A. G. Gad, A. A. Abohany, and S. E. Sorour, ``An efficient data mining technique for assessing satisfaction level with online learning for higher education students during the COVID-19,'' IEEE Access, vol. 10, pp. 6286-6303, 2022.

R. Katarya, J. Gaba, A. Garg, and V. Verma, ``A review on machine learning based student’s academic performance prediction systems,'' Proc. of 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), pp. 254-259, 2021.

S. Sood, and M. Saini, ``Hybridization of cluster-based LDA and ANN for student performance prediction and comments evaluation,'' Education and Information Technologies, vol. 26, no. 3, pp. 2863-2878, 2021.

S. Poudyal, M. J. Mohammadi-Aragh, and J. E. Ball, ``Prediction of student academic performance using a hybrid 2D CNN model,'' Electronics, vol. 11, no. 7, pp. 1-21, 2022.

F. R. Trindade and D. J. Ferreira, ``Student performance prediction based on a framework of teacher's features,'' International Journal for Innovation Education and Research, vol. 9, no. 2, pp. 178-196, 2021.

A. J. Baruah and S. Baruah, ``Data augmentation and deep neuro-fuzzy network for student performance prediction with MapReduce framework,'' International Journal of Automation and Computing, vol. 18, no. 6, pp. 981-992, 2021.

S. Ranjeeth, T. P. Latchoumi, and P. V. Paul, ``Optimal stochastic gradient descent with multilayer perceptron based student's academic performance prediction model,'' Recent Advances in Computer Science and Communications, vol. 12, no. 1, pp. 1-14, 2020.

L. Barik, A. A. Alrababah, and Y. Al-Otaibi, ``Enhancing educational data mining based ICT competency among e-learning tutors using statistical classifier,'' International Journal of Advanced Computer Science and Applications, vol. 11, no. 3, pp. 561-568, 2020.

N. Khodeir, ``Student modeling using educational data mining techniques,'' Proc. of 2019 6th International Conference on Advanced Control Circuits and Systems (ACCS) & 2019 5th International Conference on New Paradigms in Electronics & information Technology (PEIT), pp. 7-14, 2019.

S. Wan and Z. Niu, ``A hybrid e-learning recommendation approach based on learners' influence propagation,'' IEEE Transactions on Knowledge and Data Engineering, pp. 827-840, 2020.

C. Vieira, P. Parsons, and V. Byrd, ``Visual learning analytics of educational data: A systematic literature review and research agenda,'' Computers & Education, vol. 122, pp. 119-135, 2018.

D. Shah and T. Zaveri, ``Hyperspectral endmember extraction using Pearson's correlation coefficient,'' International Journal of Computational Science and Engineering, vol. 24, no. 1, pp. 1-9, 2021.

N. Gokilavani and B. Bharathi, ``An enhanced adaptive random sequence (EARS) based test case prioritization using K-mediods based fuzzy clustering,'' Proc. of 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI), pp. 567-572, 2020.

D. Rains, H. Lievens, G. J. M. D. Lannoy, M. F. Mccabe, R. A. M. de Jeu, and D. G. Miralles, ``Sentinel-1 backscatter assimilation using support vector regression or the water cloud model at European soil moisture sites,'' IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1-5, 2021.

T. V. Su and N. D. Hien, ``Strong Karush-Kuhn-Tucker optimality conditions for weak efficiency in constrained multiobjective programming problems in terms of mordukhovich subdifferentials,'' Optimization Letters, vol. 15, no. 4, pp. 1175-1194, 2021.

J. Liang, ``Problems and solutions of art professional service rural revitalization strategy based on random forest algorithm,'' Wireless Communications and Mobile Computing, vol. 2022, no. 1, pp. 1-11, 2022.

S. Da˘gıstanlı, S. Sönmez, M. Ünsel, E. Bozda˘g, A. Kocata¸s, M. Bo¸sat, E. Yurtseven, Z. Caliskan, and M. G. Gunver ``A novel survival algorithm in COVID-19 intensive care patients: The classification and regression tree (CRT) method,'' African Health Sciences, vol. 21, no. 3, pp. 1083-1092, 2021.

S. B. Lee, Y. J. Kim, S. Hwang, H. Son, S. K. Lee, Y. I. Park, and Y. G. Kim, ``Predicting Parkinson's disease using gradient boosting decision tree models with electroencephalography signals,'' Parkinsonism & Related Disorders, vol. 95, pp. 77-85, 2022.

Y. Wu, Q. Zhang, Y. Hu, K. Sun-Woo, X. Zhang, H. Zhu, L. Jie, and S. Li, ``Novel binary logistic regression model based on feature transformation of XGBoost for type 2 diabetes mellitus prediction in healthcare systems,'' Future Generation Computer Systems, vol. 129, pp. 1-12, 2022.

F. A. Maruf, R. Pratama, and G. Song, ``DNN-Boost: Somatic mutation identification of tumor-only whole-exome sequencing data using deep neural network and XGBoost,'' Journal of Bioinformatics and Computational Biology, vol. 19, no. 6, pp. 1-16, 2021.

L. Yao, Z. Fang, Y. Xiao, J. Hou, and Z. Fu, ``An intelligent fault diagnosis method for lithium battery systems based on grid search support vector machine,'' Energy, vol. 214, 118866, 2021.

Author

Shufang Yang

Shufang Yang was born in September 1981 and received her master's degree of art from Wuhan University in June 2007. She is working at Hubei University of Police as an associate professor. She is interested in English teaching.