Machine learning is the process of programming a computer to solve a given problem
using sample data or past experience. NNs, as two excellent methods in the field of
machine learning, have been widely applied in research topics of performance prediction
in recent years. Moreover, the optimization problem of this method in the application
process has become a constantly developing new topic.
3.1 An Interpretable Machine Learning Model Based on Improved NNs
Although NNs have good utility in processing sample information of BCS, there is still
some optimization space. In practical applications, due to different network structures,
the prediction accuracy of NNs is also different, and even over-fitting problems occur.
That is, due to unreasonable initial state selection and excessive training of the
network, the weight threshold falls into the local minimum value [11]. Therefore, to optimize the selection of IWT for NNs, this study proposes a method
based on genetic algorithm (GA) to divide training set and testing set (TrS & TeS).
A comprehensive new method for optimizing the IWT of the network and optimizing dataset
partitioning is proposed by integrating the optimization methods of dividing the TrS
& TeS designed in the research.
Building a NN prediction model has several steps: input and output parameter selection,
normalization processing, selection of network structure, and network training. In
the selection of input and output parameters, based on the parameters and performance
of BCS, the operating conditions and environmental parameters of BCS can be ignored
when establishing a prediction model. The input parameters for selecting the model
include speed, operating mode, conveying capacity, conveying distance, etc. [12,13].
NNs are a parallel processing system that achieves interpretability of parameters
through normalization processing. This study selected $[-1$, $1]$ normalization in
machine learning to normalize the parameters, and the expression function is Eq. (1).
In Eq. (1), $x_{\max } $ and $x_{\min } $ are the max and min values of the sample. $x$ and
$\bar{x}$ represent vectors before and after normalization. The selection of NNs directly
affects their predictive performance. In the study of BCS performance, the speed,
operation mode, conveying capacity, conveying distance, environmental temperature
and other influencing factors of belt conveyors vary. Therefore, before using a belt
conveyor, it is necessary to have a detailed understanding of the performance parameters
of the equipment in order to ensure its normal operation and safe use.
The three-layer feedforward network structure of NNs includes input, output, and hidden
layers, demonstrating good performance. This three-layer network structure is selected
as a prediction study for the performance of BCS [14]. Based on the machine performance and emission data that can be collected by the
belt transportation system equipment, the input parameters of the neural network prediction
model are ultimately determined to be: speed, power, rail pressure, and fuel injection
timing; The output parameters to be predicted include: fuel consumption rate, maximum
burst pressure, burst pressure angle, turbocharger speed, front vortex temperature,
and smoke level. Due to the fact that the neurons in hidden layer (HLN) are an essential
structural parameter affecting network performance, analysis and research were conducted
on distinctive quantity of neurons. The input layer is equipped with $m$ input nodes,
and any input signal is expressed as $x_{i} $. The hidden layer has l hidden neurons,
and any of output layer neuron (OLN) is denoted by t. The output layer owns $n$ output
neurons. The activation transfer function is the key of NNs and determines their utility,
as shown in Eq. (2).
In Eq. (2), $f(t)$ is a threshold function that reflects the excitation or inhibition of neurons.
The network training adopts an error backpropagation algorithm, sets a set of samples,
and calculates the connection weight matrix (CWM) in the layer of input and hidden,
as shown in Eq. (3).
The CWM between the output and the hidden layers is calculated as shown in Eq. (4).
The thresholds $\theta ^{1} $ and $\theta ^{2} $ of HLN and OLN are calculated, as
shown in Eq. (5).
The output of HLN is obtained from the forward propagation of NN working signals,
as shown in Eq. (6).
In Eq. (6), $l$ represents the amount of HLN. The expected error based on the OLN output is
obtained as shown in Eq. (7).
In Eq. (7), $\theta _{j}^{1} $ and $\theta _{k}^{2} $ represent the thresholds of the corporate
HLN and OLN. Next, the error signal is backpropagated and the link values are modified
layer by layer. Finally, the partial derivative of the connecting weight value to
the error is obtained, as shown in Eq. (8).
Due to overtraining, the network falls into local minima, resulting in overfitting
and affecting generalization ability. Fig. 1 shows the NN process.
The ``timely termination'' is adopted to optimize the process to define target errors
to a small extent and provide sufficient training space. This method monitors the
timely error changes of the test set. When the network suffers from a local min and
the test set error cannot jump out of the local minimum within the specified cycles
numbers, the training will be terminated and the optimal state of the set will be
output.
Fig. 1. The training process of neural networks.
3.2 Comprehensive Improvement of NNs and SVM for Predicting High Performance BCS
Due to changes in network structure, the predictive performance of NNs varies. Therefore,
it is necessary to randomly define the IWT and divide the TrS & TeS according to a
certain proportion, and repeat the training of the network multiple times to analyze
its performance [15]. The performance parameters of commonly used evaluation and prediction models are
shown in Eqs. (9) and (10).
The calculation of mean square error (MSE) and mean absolute percentage error (MAPE)
is Eq. (10).
In Eqs. (9) to (10), $n$ represents the samples. $\hat{y}_{i} $ and $y_{i} $ represent the target and
predicted values of the $i$-th sample. NNs provide direction for performance prediction
of BCS. However, NNs are all influenced by certain factors and exhibit certain instability,
so it is necessary to optimize them [16]. When using the "timely termination" method to solve the overfitting problem of neural
networks, it is necessary to divide the training samples into a training set and a
validation set. The training set is used for training network weights and thresholds,
while the validation set is used to detect changes in training errors, thereby determining
the optimal number of training cycles. Therefore, the division of the training set
and validation set will also affect the predictive performance of the trained network.
Therefore, a genetic algorithm based optimization algorithm for dividing the training
and validation sets is proposed in this study. This chapter will compare and study
the existing optimization methods for initial weight thresholds of neural networks
using genetic algorithms with the neural network training and validation set optimization
algorithm proposed in this paper. By integrating the two optimization algorithms,
a new comprehensive optimization algorithm for initial weight values of neural networks
and optimization algorithms for training and validation sets will be proposed. In
traditional GA, a fitness function is constructed based on the objective function
of the problem to be optimized, an initial population is created, followed by genetic
coding, evaluation, selection, and calculation of the already encoded population,
and multiple iterations are performed. After the above steps, the optimal adaptive
individual is the optimal solution of the problem [17]. Fig. 2 shows the flowchart of the basic GA.
Fig. 2. Genetic algorithm flowchart.
Firstly, to generate an initial population randomly, calculate the fitness of each
individual, and then evaluate their ability to adapt to the environment through a
defined fitness function. Selecting individuals based on fitness and reproduce the
next generation, followed by genetic operations, including crossover and mutation,
to ultimately test the termination conditions of the algorithm. Optimizing the division
of the TrS & TeS of the NN, using binary encoding to achieve individual encoding.
On this basis, this study proposes a sample partitioning method based on chromosome
gene information. The specific optimization population initialization method, encoding,
and genetic operation steps are as follows. Initializing the population, marking the
sorted training samples with an index number, and dividing them into a TrS & TeS proportionally
according to random sorting, to obtain a random quantity of individuals in the given
population. To assign the index numbers of the obtained individuals and generate chromosomes.
1 represents the training set sample, and 0 represents the test set sample. The new
genetic operation involves adjusting the mutation and crossover operators, as shown
in Fig. 3.
Fig. 3. Genetic manipulation.
The crossover step first decodes the parent chromosome to obtain the set to which
the sample belongs, and then integrates the test set samples of the decoded two individuals.
Randomly to select a certain number of samples as new test set samples, while the
rest are training set samples. Based on the new TrS & TeS samples, recoding and obtaining
offspring chromosomes. The mutation operation involves randomly selecting a certain
amount of training and testing samples for equal sample exchange [18]. Based on the optimization of the IWT and test set partitioning of NNs mentioned
above, this study proposes a comprehensive NN optimization method combining the two,
and establishes a high-performance prediction model for BCS. By comparing the initial
weight threshold and optimization methods for training and validation sets of neural
networks based on genetic algorithms, and considering that both optimization methods
can have a certain optimization effect on the performance of the network. Therefore,
a comprehensive neural network optimization algorithm is proposed, which combines
the optimization objectives of the above two methods. In the main line of the optimization
methods for the training and validation sets with obvious optimization effects, the
initial weight threshold optimization of each individual's network is added to achieve
comprehensive optimization of the neural network, in order to find the best network
state to predict the response of the belt transportation system. The comprehensive
process of the algorithm is Fig. 4.
Fig. 4. Comprehensive optimization algorithm flowchart.
Firstly, an initial population with different combinations of training and validation
sets is generated. Then, before optimizing the training and validation sets, optimization
of the initial network weight threshold for each individual is added. After optimizing
the initial weight threshold of each individual in the network, the optimal network
performance for each training and validation set individual is obtained, and then
genetic evolution is performed on them to generate a new population. Because the partition
of the training set and validation set is different, the corresponding optimal initial
weight threshold of the network is also different. Therefore, before the evolution
of each generation of new populations, the network initial weight and threshold will
also be optimized. The termination condition for the final evolution is still the
maximum evolution generation. Once the maximum evolution generation is reached, a
network with the best partition of training and validation sets is output, and the
initial weight threshold of the network is also its corresponding optimal value.
Both optimizations in Fig. 4 are completed by GA, and the fitness function is shown in Eq. (11).
In Eq. (11), $\hat{y}_{i} $ is the estimated value of $y_{i} $, $\bar{y}$ is the average value
of $y$, $SSE$ is the sum of squared residuals, and $SST$ is the sum of squared total
deviations.
After optimization, the system performance was predicted and SVM parameters were selected
in combination with FSA. FSA is an intelligent optimization algorithm based on fish
behavior, which gradually optimizes by updating its own position during each iteration
process [19]. The current state of the $i$-th fish is set to $X_{i} $, with a fitness of $Y_{i}
$. Each fish selects a state at random, as shown in Eq. (12).
In Eq. (12), $Rand()$ is a random number generated between 0 and 1. $X_{i} $ represents the current
state of the $i$-th fish. $\left\| X_{v} -X_{i} \right\| $ represents the distance
between $X_{v} $ and $X_{i} $. $X_{i\mid next} $ represents the next state of the
$i$-th fish. FSA mimics the way fish flock, allowing them to move towards the center
while limiting crowding and avoiding overcrowding, as shown in Eq. (13).
In Eq. (13), with one's own position as the center, the number of fish within the perception
range is $N_{j} $, forming a set $S_{j} $. If set $S_{j} $ is an empty set, indicating
that there are fish within the perception range of the $i$-th fish, then calculate
the center position $X_{center} $ of the set according to Eq. (14) and calculate the fitness value $Y_{center} $ of the center position.
If tail chasing behavior occurs, it indicates that each fish is pursuing the closest
fish with the highest fitness [20]. If $X_{\min } $ is the center, the number of fish $N_{f} $ within the perception
range satisfies Eq. (15).
In Eq. (15), $X_{i} $ searches for the fish $X_{\min } $ with the best fitness within its perception
range, and its fitness value is $Y_{\min } $. Then, it is determined that if $Y_{\min
} >Y_{i} $, foraging behavior will be performed, and vice versa, Eq. (15) is satisfied. If it indicates that the location has good adaptability and is not
too crowded, then execute Eq. (16) to move forward in the direction of fish $X_{\min } $ with the best adaptability.
Otherwise, perform foraging behavior.
In Eq. (16), $\left\| X_{v} -X_{i} \right\| $ represents the distance between $X_{v} $ and $X_{i}
$. Fig. 5 shows FSA process.
When SVM is used for BCS performance prediction, its prediction performance may vary
due to differences in kernel function parameters and penalty factors. And for different
prediction targets, the distribution also varies [21]. When using FSA to optimize SVM, the algorithm can accurately and efficiently optimize,
and the optimization is stable. Regardless of the initial population distribution
state, it can quickly converge to the optimal region. The fish species in each iteration
are concentrated towards the optimal region. The algorithm can obtain a more detailed
search in the optimal region [22]. However, to avoid the algorithm falling into local optima, restrictions have been
placed on the crowding degree of the fish school.
Fig. 5. Flowchart of fish school algorithm.