Mobile QR Code QR CODE

  1. (Department of College English, Zhejiang Yuexiu University, Shaoxing, 312000, China ruixue.zhang@gmx.com )



Decision tree, Resource recommendation model, ID3 algorithm, English reading

1. Introduction

English is the lingua franca for communication in all disciplines, so English learning is becoming increasingly important in universities (Yi, 2020). The increase in teaching requirements has not brought about changes in teaching methods, and most educators have optimized the teaching arrangements from inside and outside the classroom or improved them for English teaching. Nevertheless, such improvements are limited (Duan, 2021). Internet development has enabled various learning resources to spread worldwide. This vast pool of resources can provide specific ideas for teaching English (Zhou, 2021). Among them, the selection of reading resources has a certain impact on reading instruction. Students’ learning progress will be delayed if the selected resources do not match the characteristics of the students. Hence, establishing an interactive adaptive reading resource selection can effectively improve English reading instruction (Ma, 2021). This study used ID3, a classification algorithm with interaction, as the basis of the model and optimized its information gain formula to solve the problem of local optimality. The model before and after improvement and the traditional recommendation method were simulated and compared to evaluate the performance and practicality of the model. The change in students’ reading scores and feedback was the performance indicator, and the study also attempted to use the model to mine the recommended solutions for different types of students to provide some ideas for improving English reading instruction.

2. Related Work

The ID3 algorithm is the most commonly used algorithm in decision trees and is used widely as the basis for various complex systems because of its superior classification performance. Park et al. (2018) developed an ID3 adaptive path selection model using the fuzzy decision tree algorithm to overcome the sensitivity of decision trees in route selection. Simulation experiments were conducted on this model. The results showed that this improvement could improve the prediction accuracy of the model with good adaptability. An and Zhou (2022) examined the effect of the decision tree algorithm in rural energy construction and set thresholds for the selection algorithm of terrain features by the ID3 algorithm to promote attribute complementarity to filter irrelevant attributes. The spatial optimization configuration problem for establishing solar energy in rural areas showed that the algorithm had particular promotion potential. Abbas A R and Farooq A O used ID3 to distinguish between skin and non-skin pixel types, specifically by improving the ID3 algorithm to improve the skin detection accuracy and exclude the interference of skin color on the recognition results. They added three color space data sets to the algorithm. The results showed that the system accuracy of each index was above 99.50% (Abbas and Farooq, 2019). Karthi et al. (2018) used data mining for accident prediction in the railroad sector. They used text mining techniques to mine the data provided by the user and the railroad sector, where the unstructured data provided by the railroad sector was analyzed using the ID3 algorithm to predict the cause of accidents. Pratama and Saragi (2018) attempted to classify the quality of cassava to ensure the quality of related processed products by examining various parameters of cassava and image processing of whiteness and speckle degree in visual parameters and then classifying them using the ID3. Maingi et al. (2019) proposed an ID3-based decision tree for symptom burden classification against disease outbreaks. The algorithm specifically sorts and classifies the disease burden information gained to derive the required knowledge. The results showed that their proposed method could support the related field.

Reading comprehension is an important segment of English language learning and one item requiring improvement in college education. Educational researchers have been trying to improve the quality of reading comprehension in various ways. Zaiter (2020) suggested that reading is as indispensable as writing, and as an educator, he believed that students should be able to find motivation for reading and writing. Nevertheless, there is a difference between reading and writing when it comes to academic writing. They analyzed the situation of English majors in the Arab world and proposed remedial measures supported by extensive experimental data to help prevent plagiarism. Wu (2021) believed that it is necessary to improve students’ thinking when teaching English reading comprehension, which is one of the core competencies of the subject. Chakraborty and Chowdhury (2021) reported that reading comprehension is one of the essential English skills at all levels of education, and the importance of reading comprehension in obtaining a degree is becoming an issue. Academic reading is an early manifestation of this concern, and the source of this finding is based on the results of a survey of students in government colleges in Bangladesh who believed that teaching academic reading to undergraduate students strengthens their competitiveness. Chinese higher education policy focuses on the importance of developing students’ intercultural competence. On the other hand, Yu and Maele (2018) suggested that this is not the case in practice. Hence, they conducted a curriculum study of a university college while building a Baker-based model of intercultural awareness to train participants. The results proved that reading courses can help Chinese students build intercultural awareness. Audina et al. (2020) reported that students who do not understand the reading content are prone to translate word by word rather than comprehending it as a whole. In response to this problem, they investigated the teaching strategies of English teachers and their causes. They established the DRA strategy to guide students to understand the text content, and the results proved that this attempt is meaningful.

The application and improvement of the ID3 algorithm by domestic and foreign scholars have been proven effective. Moreover, the data processing models established under ID3 are used widely in various fields, and the classification accuracy is improving. The algorithm can combine the characteristics of users and data objects for adaptive matching, which is a good fit for the problem that college English reading cannot be adaptively recommended for students. Most educators improve English education from the text and related issues inside and outside the classroom. Few integrate intelligent technologies into English education. Thus, attempts at intelligent control have some positive significance.

3. Construction of Reading Resource Recommendation Model based on ID3 Algorithm

3.1 Decision Tree Composition based on ID3 Algorithm

Data mining often requires a supervised learning algorithm to predict the attributes and categories of unknown data. Tree-structured decision trees represent this class owing to their good discriminative rule generation mechanism, among which ID3 is one of the most commonly used algorithms (Hong et al., 2018). This algorithm first calculates the gain value of the information, and the attribute with the highest result is used as the basis for classifying other information. This approach minimizes the amount of information required for classification and follows the principle of minimum randomness of division. The decision tree is constructed with a modular distinction of known attributes from top to bottom, starting with the root node for the classification calculation of the sample set, which is then used as a basis for several divisions of the sample (Tulloch et al., 2018). The decision tree will be iterated to achieve the classification purpose of the above-mentioned sample until the construction is completed. The non-categorical attributes of its structure will become non-leaf nodes, and their attribute values are represented as branches. The complete structure from the root of the tree to the leaf nodes represents a complete classification rule, and the mapping of the entire rule builds an expression. The result will become a resource recommendation expression.

The ID3 algorithm is simple and has strong learning ability. Its classification speed is fast, so it is suitable as the basis of the algorithm for large-volume data processing. Here, let the number of possible class labels of the sample set$X$ be$n$. The probability distribution is expressed as Eq. (1).

(1)
$ \left\{\begin{array}{l} P\left(X=x_{i}\right)=p_{i}\\ i=1,2,\ldots ,n \end{array}\right. $

At this point,$X$ contains the information entropy, whose expression is written as Eq. (2).

(2)
$ Entropy\left(X\right)=Entropy\left(P_{1},P_{2},\ldots ,P_{n}\right)=-\sum _{i=1}^{n}P_{i}\log P_{i} $

If the value of $P_{i}$ > 0 in Eq. (2), then the value of$0\log 0$ is also$0$. The base of the logarithm is$2$ because the information encoding method is binary encoding. If two variables are to be calculated in the sample set$\left(X,Y\right)$, then the probability distribution is expressed as Eq. (3).

(3)
$ \left\{\begin{array}{l} P\left(X=x_{i},Y=y_{j}\right)=p_{ij}\\ i=1,2,\ldots ,n\\ j=1,2,\ldots ,m \end{array}\right. $

The conditional entropy of $Y$ under the specific conditions of $X$ was calculated using Eq. (4).

(4)
$ Entropy\left(Y\left| X\right.\right)=-\sum _{i=1}^{n}P_{i}H\left(Y\left| X\right.=x_{i}\right) $

$P_{i}$ is then expressed as the mathematical expectation of the probability distribution of$X$ for a given conditional shrimp. If the information entropy of another dataset $A$ is $Entropy\left(A\right)$ and the empirical conditional entropy in this dataset is $Entropy\left(B\left| A\right.\right)$ , the information gain of the dataset$B$ can be calculated using Eq. (5).

(5)
$ Gain\left(A,B\right)=Entropy\left(A\right)-Entropy\left(A\left| B\right.\right) $

The larger the result of Eq. (5), the greater the information gain. The purity of the subset of the sample is higher. The decision tree selects the attribute with the larger result value as the classification attribute and constructs the nodes. Finally, it constructs the complete decision tree in a cycle to analyze the recommended rules and recommend the appropriate reading resources for college students.

The main advantage of the ID3 algorithm is the concept of information entropy. The information gained reduces the sensitivity to abnormal training samples. This easy operation mode of the upper and lower search space allows it to handle complex samples. The tree structure lets the user visualize the classification rules and principles (Andrew et al., 2018). Nevertheless, the algorithm also has some drawbacks: the relationship between attributes is more complex, and the direction of subsequent optimization, where attributes with large information gain values are not the best for splitting because an increase in attribute value also leads to a larger gain value.

3.2 Optimization of ID3 Algorithm in the Resource Recommendation Model

The principle of the ID3 algorithm is to use the attribute with the greatest information gain as the splitting attribute. On the other hand, multi-valued information will also cause an increase in gain, so the problem of multi-value bias will directly affect the classification accuracy of this algorithm (Li et al., 2018). Let$A$ be an attribute of the dataset$X$; divide its value domain into two equal parts; set the attribute as$A'=\left(A_{1},A_{2},\ldots ,A_{n+1}\right)$; determine the possibility of attribute value bias of this algorithm by calculating $A_{i}$ and$A'_{i}$, which are the gain of the attribute values before and after the transformation. $Gain\left(X,A\right)$ is the gain of the attribute $A$. $Gain\left(X,A'\right)$ is the gain of the new attribute$A'_{i}$. In this case,$Gain\left(X,A\right)$ is calculated using Eq. (6).

(6)
$ \begin{array}{l} Gain\left(X,A\right)=Entropy\left(X\right)-Entropy\left(X\left| A\right.\right)\\ =-\sum _{i=1}^{m}P\left(X_{i}\right)\log _{2}\left(P\left(X_{i}\right)\right)+\\ \sum _{j=1}^{n}P\left(A_{j}\right)\sum _{i=1}^{m}\left[P\left(X_{i}\left| A_{j}\right.\right)\log _{2}\left(P\left(X_{i}\left| A_{j}\right.\right)\right)\right] \end{array} $

$P\left(D_{i}\right)$ in Eq. (6) represents the probability of the attribute of class $i$ in the dataset,$P\left(A_{j}\right)$ is the proportion of the sample size, and$P\left(D_{i}\left| A_{j}\right.\right)$ is the probability of attribute$A$ having a value of$A_{j}$ corresponding to the attribute of class $i$ in the dataset. Similarly, the gain value of the new attribute$Gain\left(X,A'\right)$ is calculated using Eq. (7).

(7)
$ \begin{array}{l} Gain\left(X,A'\right)=Entropy\left(X\right)-Entropy\left(X\left| A'\right.\right)\\ =-\sum _{i=1}^{m}P\left(X_{i}\right)\log _{2}\left(P\left(X_{i}\right)\right)\\ +\sum _{j=1}^{n+1}P\left(A'_{j}\right)\sum _{i=1}^{m}\left[P\left(X_{i}\left| A'_{j}\right.\right)\log _{2}\left(P\left(X_{i}\left| A'_{j}\right.\right)\right)\right] \end{array} $

In this case, the difference between the two gain values is calculated using Eq. (8).

(8)
$ \begin{array}{l} Gain\left(X\left| A\right.\right)-Gain\left(X\left| A'\right.\right)\\ =P\left(A'_{n}\right)\sum _{i=1}^{m}P\left(X_{i}\left| A'_{n}\right.\right)\log _{2}\left(P\left(X_{i}\left| A'_{n}\right.\right)\right)\\ +P\left(A'_{n+1}\right)\sum _{i=1}^{m}P\left(X_{i}\left| A'_{n+1}\right.\right)\log _{2}\left(P\left(X_{i}\left| A'_{n+1}\right.\right)\right)\\ -P\left(A_{n}\right)\sum _{i=1}^{m}P\left(X_{i}\left| A_{n}\right.\right)\log _{2}\left(P\left(X_{i}\left| A_{n}\right.\right)\right) \end{array} $

$L=\frac{P\left(A'_{n}\right)}{P\left(A_{n}\right)},$ $x_{i}=P\left(X_{i}\left| A_{n}\right.\right)\,,$ $p_{i}=P\left(X_{i}\left| A'_{n}\right.\right)\,,$ $o_{i}=P\left(X_{i}\left| A'_{n+1}\right.\right)$ will be introduced to calculate the gain difference value and simplify the expression of the calculation process, and the gain difference expression is expressed as Eq. (9).

(9)
$ \begin{array}{l} Gain\left(X\left| A\right.\right)-Gain\left(X\left| A'\right.\right)=P\left(A'_{n}\right)\sum _{i=1}^{m}p_{i}\log p_{i}\\ +P\left(A'_{n+1}\right)\sum _{i=1}^{m}o_{i}\log o_{i}-\sum _{i=1}^{m}x_{i}\log x_{i} \end{array} $

Eq. (9) is processed and divided by$P\left(A_{n}\right)$ to obtain Eq. (10).

(10)
$ \begin{array}{l} \frac{Gain\left(X\left| A\right.\right)-Gain\left(X\left| A'\right.\right)}{P\left(A_{n}\right)}\\ =L\sum _{i=1}^{m}p_{i}\log p_{i}+\left(1-L\right)\sum _{i=1}^{m}o_{i}\log o_{i}-\sum _{i=1}^{m}x_{i}\log x_{i} \end{array} $

Set$f\left(x\right)=x\log _{2}x$, at which point Eq. (11) is obtained.

(11)
$ \left\{\begin{array}{l} f''\left(x\right)=\frac{1}{\ln 2.x}\geq 0\\ f\left(x_{i}\right)=x_{i}\log x_{i}=f\left(Lp_{i}+\left(1-L\right)o_{i}\right) \end{array}\right. $

According to the rules of concavity and convexity,$f\left(x\right)$ is a convex function, and the following relationship can be obtained:

(12)
$ f\left(Lp_{i}+\left(1-L\right)o_{i}\right)\leq Lf\left(p_{i}\right)+\left(1-L\right)f\left(o_{i}\right) $

Eq. (13) can be obtained by processing each relationship.

(13)
$ \sum _{i=1}^{m}x_{i}\log x_{i}\leq L\sum _{i=1}^{m}p_{i}\log p_{i}+\left(1-L\right)\sum _{i=1}^{m}o_{i}\log o_{i} $

Bringing Eq. (13) into the difference of information gain comparison results in $Gain\left(X\left| A\right.\right)\leq Gain\left(X\left| A'\right.\right)$ because the attribute selection mechanism of the ID3 algorithm is based on information gain, and a larger gain value of$A'$ indicates that the algorithm has multi-value bias. Suppose students need to read resources with attributes $A$, using the traditional ID3 to classify the potential resources. Resources with multiple attributes will have the same results as those with a strong longitudinal single attribute $A$. The quality will be reduced accordingly based on this recommendation. At this time, it is necessary to improve this situation. This study solves this problem by introducing the correlation coefficient of the fixed class variable, and the improved gain formula is updated as (14).

(14)
$ \begin{array}{l} Gain'\left(X\left| A\right.\right)=\\ \frac{1}{n}\left[Entropy\left(X\right)-\left(1-\rho _{ay}\right)Entropy\left(X\left| A\right.\right)\right] \end{array} $

The above Eq. (14) of$\rho _{ay}$ represents the correlation coefficient between attribute$A$ and category $Y$. Introducing the correlation coefficient will reduce the information gain of the category with little relevance and many attribute values. This change optimizes the gain function in terms of the algorithmic process to solve the multi-value bias problem. The formula must be simplified to make the constructed decision tree operation concise. Eq. (15) expresses the final information gain formula after simplifying the logarithmic operation.

(15)
$ \begin{array}{c} Gain'\left(X\left| A\right.\right)=\frac{1}{n\ln 2}\sum _{i-1}^{n}P'_{i}\left[\frac{1}{2}\left(P'_{i}-1\right)^{2}-\left(p_{i}-1\right)\right]\\ -\frac{1}{n}\left(1-\left| \rho _{ay}\right| \right)\sum _{i=1}^{n}\frac{\left| B_{i}\right| }{\sum _{m=1}^{n}\left| B_{m}\right| }\\ \left(\frac{1}{\ln 2}\sum _{j=1}^{m}p''_{j}\left[\frac{1}{2}\left(p''_{j}-1\right)^{2}-\left(p''_{j}-1\right)\right]\right) \end{array} $

$B$ in Eq. (15) is a subset of the original dataset divided by$n$, while the original dataset has $m$ classes. The dataset $B$ is divided into subsets using$m$ again. A decision tree T, input data set X is generated based on the above optimization process. The feature value and threshold are also set. If all individuals in the data set are the same type, then generate class labels. If the data do not meet the requirements of feature set E, select the highest number of individuals as labels. If the condition is not met, follow the above formula set dispersion features on the information gain value of data set X; the maximum value is taken as the split node. If the maximum value is less than the threshold, the highest number of labels in the data set is selected as the splitting point. If the labeling point is not satisfied, the feature value less than the threshold is used as the new division basis to establish a new feature value. The above steps are repeated until the decision tree is generated, as shown in Fig. 1 below.

Fig. 1. ID3 algorithm generation architecture diagram.
../../Resources/ieie/IEIESPC.2024.13.4.372/fig1.png

3.3 Adaptive ID3 for Reading Resource Recommendation Model Construction

The reading recommendation model needs to be a two-way interactive model that adapts to the situation of the tweeted person. In contrast, the situation of the tweeted person changes, and the recommendation content should be updated in due time. Therefore, the resource recommendation algorithm should understand the characteristics of the target person, the characteristics of the reading resources, and the characteristics of the attributes that need to be classified. The data storage in the pre-processing session uses two-dimensional arrays, and discrete data should also be processed. The experimental subjects of the study were selected to participate in CET-4 learners, and their situation was modeled to understand students’ styles from four aspects based on various reading ability scales and the actual learning involved: possessed reading ability, learning goals, learning efficiency, learning style, and cognitive style. When using the ID3 algorithm for student-style classification, feature selection is crucial for constructing decision trees and the final classification results. The style data are first pre-processed, which simplifies and standardizes students' learning situations to classify students' styles accurately. First, the study defines the learning style of each student, including reading ability, learning objectives, learning efficiency, learning style, and cognitive style. Then, calculate their information entropy, conditional entropy, and information gain in different situations. Next, find the maximum value from all the feature value information gains, which will serve as the root node of the ID3 algorithm decision tree. Form branches with this value until all subsets contain data from the same category. This results in a decision tree that can classify students based on their learning style characteristics. By applying the ID3 algorithm, students can be classified based on their learning styles, better understanding each student's learning preferences and needs. This has important guiding significance for educators because it can help them better design and adjust teaching strategies to meet the needs of different types of students, improving educational effectiveness.

The reading ability (Ability, Ab) in the study was rated according to the Chinese English Reading Ability Scale, which has nine levels from small to large, indicating ability in ascending order, adapting the study to the content specified in levels 4–7 (Ma, 2021). Students with different reading abilities will select reading content to improve a particular ability. Some students aim to increase their vocabulary; others want to increase their sense of language. The study will allow them to select the reading goal in the student assessment model (Objective, Ob).

Cognitive style (Cs) is an element that affects the student’s learning abilities and characteristics. Its advantage is that it visualizes the probability of students’ success and is an explicit indicator formed over time. From a reading comprehension perspective, the two most involved cognitive styles are field-dependent and independent. Field-dependent students prefer to read texts with human subjects, and their thinking has a certain ability to synthesize. They prefer to study the text in detail when reading, but they cannot easily establish an independent reading space and are easily influenced by the outside world. Although independent students are the opposite, they pay more attention to the actual content conveyed behind the text and prefer the content of natural subjects. They will build their reading field when reading and have a specific resistance to interference. Cognitive style is an essential element of research to analyze the situation of college students. The learning result (Lr) will be assessed based on the students’ self-assessments and test results.

The model involves three attributes of reading comprehension resources, the main content that needs to be classified by ID3. Theme (Th) refers to the content source of reading resources divided into natural subjects and social sciences. The difficulty value (FV) is the level according to the overall assessment of the resources. Category (Ca) is a category of questions based on the CET-4 test, including completion reading for detail, sequential reading for logical order, and narrowly defined fine reading. The final model is constructed according to the logical order of model construction, as shown in Fig. 2.

Fig. 2. Adaptive recommendation model.
../../Resources/ieie/IEIESPC.2024.13.4.372/fig2.png

The adaptive model in Fig. 2 has four indicators in the learner segment and three indicators in the reading resource model. In the actual process, the recommended reading resources should be changed adaptively by combining both situations, while the feedback from learners is the basis for real-time updates, and the resource recommendation model built on this basis can be used as one of the teaching tools to improve teaching quality.

4. Results and Analysis

A specific CET-4 training course of a training institution was tested to assess the performance of the constructed model. The necessary information was collected to build a learner model. The ID3 algorithm was used to classify the reading resources in the resource library, and the learner model was used as the basis of the attributes for adaptive recommending. Seventy-five percent of the learner data was used as training data; the remaining reference data was used as the basis for the evaluation results. The decision tree resource generation categories are expressed regarding good or bad recommendations, specifically YES and NO. The test set was added to the ID3 algorithm, and the output results are shown in Fig. 3.

Fig. 3. Read Resource Recommendation decision tree.
../../Resources/ieie/IEIESPC.2024.13.4.372/fig3.png

The simulated data decision tree establishment is still based on the type of reading, reading difficulty, learner’s effect, and cognitive style to establish the nodes, which is similar to the decision tree establishment of the sample data, so the decision tree establishment is valid. The relationship between the accuracy of this simulation and the number of samples is as follows, as shown in Fig. 4.

Fig. 4. Relationship between the number of learners and accuracy.
../../Resources/ieie/IEIESPC.2024.13.4.372/fig4.png

The accuracy rate in the test set was close to the reference value ( > 80%). As the number of learners increased, the curve of the test data nearly approached the curve of the reference set, suggesting that the accuracy rate is also increasing and that the recommendation model of ID3 as a classification tool is effective. Although the accuracy rate obtained from the performance test experiment did not reach 90%, increasing the feature data of learners can improve the performance, suggesting that this error is an inherent limitation of the performance test experiment because of the limited data it collects.

The model was applied to the daily teaching of an English tutorial institution in the 1$^{\mathrm{st}}$ quarter of 2020, and the change in students’ reading performance was used to indicate the impact of the model on teaching. The ID3 model after improvement was used as the experimental group, and the control group was the ID3 model before improvement and the traditional English recommendation model. The reading chapters were recommended for students from the same resource library to evaluate the advantages and disadvantages of the three methods.

The initial reading ability of students in the three groups was similar, all around B5, and no students showed abnormal performance in the class (Table 1). Some differences in the outcomes of the three groups were observed after passing the first training period. The most noticeable performance improvement was in the improved ID3 recommended model group, which performed better than the standard ID3 model and traditional method groups. According to the students’ feedback, the goal achievement rate of the improved ID model group reached over 90%, indicating that the recommended model is effective.

Table 1. Changes of students in each group before and after learning.

Recommended model

Optimize ID3

Standard ID3

Traditional way

Number of students

177

172

169

Initial achievement

56.13±4.01

53.14±3.92

58.53±4.09

Average reading ability

R5+

R5

R5+

Performance improvement

18.01±1.07

13.21±1.03

9.02±1.01

Target achievement rate

91.22%

80.13%

69.21%

The accuracy of the model was assessed by evaluating the four indicators of recall, accuracy, precision, and F-value according to the above experimental groupings. A random sample of three groups was fitted with the recommendation and student feedback as criteria to obtain the four indicators. Fig. 5 presents the results of the four indicators.

Fig. 5. Comparison of the prediction performance between two models.
../../Resources/ieie/IEIESPC.2024.13.4.372/fig5.png

The overall accuracy of the improved model reached more than 95% (Fig. 5), and each index was higher than the standard ID3 model for the same case. With the change in the sampling proportion, the accuracy of the improved ID3 model did not change significantly. In contrast, the accuracy of the standard ID3 algorithm decreased as the sampling proportion decreased. Hence, the algorithm falls more easily into a local optimum as the sample size decreases. The improvement made by adding the correlation coefficient algorithm solves this problem, i.e., it does not change as the sample size changes.

During the use of the recommendation model, the ID3 improvement model group was given a reading test to monitor the change in the learners in real time. The learner profiles were first entered into the model to find students with each characteristic as a basis for finding typical learners. Table 2 lists the output of their characteristics.

Table 2. Table of typical students.

Feature dimension

Student C

Student B

Student C

Reading ability

R5

R5-

R5

Cognitive style

Dependence

Independence

Dependence

Self-evaluation efficiency

80

81

70

Initial accuracy

87.2%

80.3%

75.9%

Question type preference

Cloze

Reading Comprehension

Sort reading

Subject preference

Social

Natural

Social

As shown in Table 2, Learner A was field-dependent, with a preference for completion-type reading and sensitivity to humanities and social science texts. Learner B was field-independent, with a preference for reading comprehension and natural science texts. Student C was field-dependent, with a preference for logical sequencing and humanities and social science. All three students had similar initial abilities and minor differences in their self-assessment abilities and initial test scores. Adaptive recommendations were given to them using the improved model, and their accuracy rates were tallied, as shown in Fig. 6.

Fig. 6. Comparison of the prediction performance of the two models.
../../Resources/ieie/IEIESPC.2024.13.4.372/fig6.png

Among the three students, student A showed the greatest improvement, but his accuracy rate fluctuated the most (Fig. 6), indicating that the accuracy rate of the field-dependent students will be affected by the environment. Nevertheless, the overall improvement was significant. Although the initial ability of student B was more general, his accuracy rate also improved, but the overall fluctuations were not significant, suggesting that the student was more dependent on the difficulty of the reading material. His accuracy rate improved from 30% to 50%, indicating that the effect of the recommendation model was significant. The situation of student C was similar to B, and the improvement also proved the effectiveness of the model.

Although the model is effective, the specific recommended attributes that should significantly impact teaching and learning still need to be explored. The study analyzed each typical learner's data statistically, with each indicator good or bad, taking values ranging from 1–4 from small to large. Table 3 presents the specific results.

Table 3. Analysis of variance of the attribute and accuracy of the recommended resources.

Correspondence

Class III sum of squares

freedom

mean square

F

Significance

Correction model

12.864a

14

0.910

4.047

0.000

Th

6.241

3

1.421

7.151

0.000

Ca

0.465

1

0.516

2.364

0.163

De

0.246

1

0.246

1.036

0.221

Th×Ca

1.145

4

0.531

2.468

0.042

Th×De

1.984

1

0.359

1.634

0.201

De×Ca

2.093

1

2.147

11.397

0.001

Total

2042.000

732

/

/

/

The F-value of the model was 4.047, while the significance structure was 0.000 (Table 3), indicating that the analysis had some effect, where the question type and question x difficulty level significantly affected the students’ reading accuracy. In contrast, the other factors had little effect.

The same correlation analysis was performed on the output results of the learner’s attributes in the data results, and then the results were clustered. The eigenvalue transformation of the clustering results eventually yielded the radar plot of the recommended preferences of the improved ID3 model for various classes of students, as shown in Fig. 7.

Fig. 7. Recommended preference radar chart.
../../Resources/ieie/IEIESPC.2024.13.4.372/fig7.png

The algorithm in Fig. 7 has a higher requirement for difficulty in the resources recommended for category A. This indicates that these students have better reading ability and have their own goals and mobility. The recommended resources for these students are more in line with their requirements. Type B students prefer moderately difficult topics; subject matter and question type can also influence their correct rates. The overall situation of category C is similar to that of type A students, with a higher requirement for difficulty. In contrast, the type of questions and topics have an average effect on them.

This study compared and verified the running time of the ID3 and traditional decision tree algorithms to verify the differences between the proposed method and previous methods. The research set a data volume interval of 100 to 500. Table 4 compares the running time of the two algorithms for constructing decision trees. The ID3 algorithm proposed in this paper had higher efficiency in constructing decision trees than the traditional decision tree algorithms. The time difference between the two algorithms increased as the amount of data increased. When the data volume was 500, the running time of the ID3 algorithm in this article was 118.08 ms, which was 19.64% shorter than that of the traditional decision tree algorithm (146.94 ms). When the data volume was 100, the running time of the ID3 algorithm was 24.64 ms, which was 16.57% shorter than that of the traditional decision tree algorithm (29.54 ms). These data fully demonstrate the superiority of the ID3 algorithm in processing large-scale datasets. The performance of the traditional decision tree algorithms decreased gradually as the amount of data increased, while the ID3 algorithm in this paper maintained efficient computational speed. Therefore, the ID3 algorithm proposed in this article has practical applications, especially when dealing with large-scale datasets.

Table 4. Comparison of the runtime between two algorithms for constructing decision trees.

Data volume

Run time/ms

Traditional Decision

Tree Algorithm

ID3 algorithm

100

29.54

24.64

200

61.56

50.17

300

88.15

68.91

400

118.68

95.34

500

146.94

118.08

5. Conclusion

Reading is one of the elements that improve English learners’ abilities, and this learning module, which integrates vocabulary-linguistics and grammar, is also an important element in assessing students’ English proficiency. Improving English reading proficiency should focus on the students’ differences under the laws of education, and recommending appropriate learning content tailored to them helps improve their accuracy. The decision tree established in the study builds an adaptive recommendation model by considering the students’ characteristics to adjust the algorithm. In addition, it also optimizes the information gain formula by introducing correlation coefficients to prevent the model from falling into a local optimum. The algorithm before and after the improvement was tested. The improved algorithm fitted better with the reference value, and the results of the four indicators for evaluating the performance showed that the accuracy of the improved algorithm was above 95%. The average satisfaction of students with the recommendation model was 91.22%. Although the accuracy of the standard ID3 model did not reach 90%, the improvement path was still effective. Applying the algorithm to students in an institution showed that the learner classification module in the model can classify students into three categories. The adaptive recommendation for the three types of students found that different types of students have different requirements for resources. Students with strong learning abilities require more challenging recommended reading, while students with average or poor ability need the recommendation model to focus on topics and question types.

REFERENCES

1 
Abbas A R and Farooq A O. (2019) ‘Skin Detection using Improved ID3 Algorithm’, Iraqi Journal of Science, Vol. 60, No. 2, pp. 402-410.URL
2 
An Y and Zhou H. (2022). ‘Short term effect evaluation model of rural energy construction revitalization based on ID3 decision tree algorithm’, Energy Reports, No. 8, pp. 1004-1012.DOI
3 
Andrew, Russ, Gayle et al. (2018). ‘Decision tree for pretreatments for winter maintenance’, Transpor-tation Research Record, Vol. 2055, No. 1, pp. 106-115.DOI
4 
Audina Y, Zega N and Simarmata A et al. (2020). ‘An analysis of teacher’s strategies in teaching reading comprehension’, Lectura Jurnal Pendidikan, Vol. 11, No. 1, pp. 94-105.URL
5 
Chakraborty S B and Chowdhury. (2021). ‘Teaching academic reading in English to the undergraduate students at a government college of Bangladesh -Challenges and solutions’, IOSR Journal of Research & Method in Education (IOSRJRME), Vol. 11, No. 2, pp. 49-63.URL
6 
Duan X. (2021). ‘The application of activity-based method in English reading teaching in senior high school’, Region - Educational Research and Reviews, Vol. 3, No. 2, pp. 60-64.URL
7 
Hong H, Liu J and Bui D et al. (2018). ‘Landslide susceptibility mapping using J48 Decision Tree with AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China)’, Catena, No. 163, pp. 399-413.DOI
8 
Karthi M, Priscilla R and Benila E. (2018). ‘The patrons for anticipating the veracity of rail mishaps using text mining and ID3 algorithm’, International Journal of Pure and Applied Mathematics, Vol. 119, No. 15, pp. 1753-1759.URL
9 
Li S, Laima S and Li H. (2018) ‘Data-driven modeling of vortex-induced vibration of a long-span suspension bridge using decision tree learning and support vector regression’, Journal of Wind Engineering and Industrial Aerodynamics, No. 172, pp. 196-211.DOI
10 
Ma Y. (2021). ‘The application of schema theory in the teaching of English reading in senior high schools’, Region - Educational Research and Reviews, Vol. 3, No. 3, pp. 17-20.URL
11 
Ma Y. (2021). ‘The application of schema theory in the teaching of English reading in senior high schools’, Region - Educational Research and Reviews, Vol. 3, No. 3, pp. 17-20.URL
12 
Maingi N N, Lukandu I A and Mwau M. (2019). ‘Inter-county comparative analysis of ID3 decision tree algorithms for disease symptom burden classification and diagnosis’, International Journal of Science and Research (IJSR), Vol. 8, No. 5, pp. 83-89.URL
13 
Park K, Bell M G, Kaparias I and Belzner H. (2008). ‘Soft discretization in a classification model for modeling adaptive route choice with a fuzzy id3 algorithm’, Transportation Research Record, Vol. 2076, No. 1, pp. 20-28.DOI
14 
Pratama Y and Saragi H S. (2018). ‘Cassava quality classification for tapioca flour ingredients by using ID3 algorithm’, Indonesian Journal of Electrical Engineering and Computer Science, Vol. 9, No. 3, pp. 799-805.URL
15 
Tulloch A, Nancy A and Stephanie A G et al. (2018). ‘A decision tree for assessing the risks and benefits of publishing biodiversity data’, Nature Ecology & Evolution, Vol. 2, No. 8, pp. 1209-1217.DOI
16 
Wu J. (2021). ‘The research on the English reading teaching mode aiming at the improvement of thinking quality’, Region - Educational Research and Reviews, Vol. 3, No. 2, pp. 40-43.URL
17 
Yi H. (2020). ‘Teaching strategies of cultivating humanistic literacy in reading teaching’, Education Study, Vol. 2, No. 3, pp. 174-183.URL
18 
Yu Q and Maele J V. (2018). ‘Fostering intercultural awareness in a Chinese English reading class’, Chinese Journal of Applied Linguistics, Vol. 41, No. 3, pp. 357-375.DOI
19 
Zaiter W A. (2020). ‘Reading and writing skills: The challenges of teaching at college level’, Addaiyan Journal of Arts Humanities and Social Sciences, Vol. 1, No. 10, pp. 41-51.URL
20 
Zhou Q. (2021). ‘The application of TBLT to English reading teaching in junior high school’, Region - Educational Research and Reviews, Vol. 3, No. 2, pp. 52-55.URL

Author

Ruixue Zhang
../../Resources/ieie/IEIESPC.2024.13.4.372/au1.png

Ruixue Zhang obtained her Master’s Degree in English Language and Literature (2009) from the Southwest University in China. Presently, she is working as a professor in the Department of College English, Zhejiang Yuexiu University, Shaoxing. She has published articles in more than 10 national or international journals and conference proceedings. Her areas of interest include English Teaching and Educational Management.