3.1 Target Alignment of Feature Points in Oracle Image
Target alignment is a technical measure to locate a target object [12]. A simple method is to take the mean value of the coordinates corresponding to each
local feature point as the center point of the target. However, there are still many
problems in the process of actual oracle image feature processing. Therefore, it is
necessary to align the feature points of oracle images before classification to reduce
the classification error of subsequent oracle images.
<New paragraph>If all feature points are averaged directly, the problem of center
point offset will occur. At the same time, the local areas corresponding to these
feature points have much overlap and have no value listed separately. Therefore, we
integrated the feature points into one feature point, and its coordinate point is
the coordinate mean value of these feature points. The centers of six feature points
were used to describe the center of the target object of the feature points of the
whole oracle image, in which the position of the feature points will change greatly.
Thus, the calculation results of the center point are greatly affected, and the following
results are obtained:
where $\delta _{x}$ and $\delta _{y}$ represent the coordinates of the center point
of the oracle image feature target, N represents the number of feature points present
in the graph, and $W\left(i\right)x$ and $W\left(i\right)y$ represent the coordinates
of these present feature points.
In the alignment process, the key parts of the oracle image features are included,
and the expression forms of the key features are as follows:
$Q\left(i\right)x$ and $Q\left(i\right)y$ represent the image point coordinates of
the characteristic points extracted from the oracle bone text image, and the points
that do not appear in the coordinate are removed:
Based on the elimination of non-key oracle image feature points, the target alignment
of oracle feature points is realized, and the alignment equation is:
In Eq. (15), $M_{0}$ represents the key threshold variable with feature points aligned, and $\beta
$ represents the number of threshold feature points.
3.2 Design of Oracle Image Automatic Classifier based on Deep Learning
After the key points of anoracle image are aligned, we useda deep learning algorithm
to design an automatic classifier for anoracle image to realize the final classification.
In the field of deep learning, model migration refers to the migration of the trained
model to the target task. Because a large model usually needs to be trained in deep
learning, it needs much label data to prevent over fitting [13]. However, sometimes, the target data does not have much label data, but the target
task has similar characteristics to other tasks with a large amount of data, so model
migration can be carried out [14].
<New paragraph>For example, the source task is image recognition. Image recognition
has a large amount of labeled data, such as ImageNet data, which can be used to train
a model. Target tasks can be similar, such as specific target recognition or detection,
or target recognition in special scenes. Therefore, in the design of the automatic
classifier, we first learned the preprocessed oracle images using transfer learning.
The goal of transfer learning is to solve the classification problem when the training
data from the source domain is different from the data in the target domain.
<New paragraph>Transfer learning methods mainly have two categories. The first kind
is better for the classification problem of the target domain by learning to add different
weights to the source domain data. The second type finds a common feature space, maps
the source domain and target domain data into this feature space, learns through the
features of the source domain mapping, and then is tested with the features of the
target domain mapping to complete the classification of the target domain data. For
the second kind of method, we mainly use linear mapping or a kernel method to minimize
the distribution of source domain data and target domain data under the assumption
that the target task does not have labeled data. In this way, the model needs unsupervised
adding of the information of the target task data during training so that the model
can adapt to the target task.
The data in the oracle image feature are set as the training data in the source domain,
including training sample data $s_{i}$. We used $s_{ix}$ to represent oracle image
sample i, and the sample data are dimensional vector data D. We used $s_{iy}$ to represent
the oracle image data after the label and $C_{s}$ to represent the number of villas.<note:
ambiguous>We set the target domain as:
where $N$ represents the number of samples trained on the oracle images.
Then, given target domain data $X_{t}$and source domain data $X_{i}$, usually, their
probability density distribution is different when they are taken from different datasets.
In order to reduce the difference of distribution, we hope that in the migrated feature
space, the probability distribution of source domain features and target domain features
are as close as possible. In order to achieve this goal, the MMD criterion was used
to measure the difference of distribution. This criterion acts on the previous layer
of the classification layer (that is, the final feature output of the network) and
is used to measure the difference of the output features of the source domain data
and the target domain data after the feature transformation of the network. The equation
of MMD is:
where $\left\| s_{ix}-s_{iy}\right\| ^{2}$ is the square of mmd, and the inner product
can be obtained by simplifying the equation.
Since the data center (first-order statistics) represents the average characteristics
of the image and is a way to represent the data distribution, the MMD criterion is
adopted so that the centers of the source domain data and the target domain data can
be closer after network transformation. Aconvolution neural network was used to design
the automatic classifier of an oracle image. A convolutional neural network is a hierarchical
structure model. A standard convolutional neural network is generally composed of
a data input layer, convolution layer, pooling layer, full connection layer, and output
layer. The input is the original data, such as an image, original audio data, etc.
The data input layer is used to preprocess the original input data, such as deaveraging,
normalization, etc. Using data preprocessing can avoid affecting the training effect
due to too many data deviations.
<New paragraph>The convolutional neural network extracts the high-level semantic information
layer by layer from the original data input layer and abstracts it layer by layer
(that is, feedforward operation). The convolutional neural network formalizes the
training task objective into the objective function model at the output layer. Through
the back-propagation algorithm, the error value between the predicted value and the
real value is transmitted back from the last layer step by step to update the parameters
between levels and fed back again after updating the parameters. In this way, the
process is repeated until the function model convergesto realize the classification
training of the network model.
If matrix x is used to represent the input image, the convolution kernel is matrix
w, and bias is b, then the calculation of a convolution surface can be expressed as:
If the input is multiple channels (that is, if there are D images $\left(\mathrm{x}_{1},\mathrm{x}_{2},\ldots
\mathrm{x}_{\mathrm{D}}\right)$), the corresponding convolution kernel is $\left(\mathrm{w}_{1},\mathrm{w}_{2},\ldots
\mathrm{w}_{\mathrm{D}}\right)$, the bias is b, and the convolution surface is:
During a convolution operation, the scale of the output characteristic image becomes
smaller and distorted compared with the input image. The more layers, the smaller
the image is, resulting in the inability to continuously increase the number of convolution
layers.
The edge of the input feature matrix will be calculated less than the elements inside
the matrix<note: ambiguous>, so the edge data will be continuously lost. The data
at the first edge will be discarded after being used only once so that the influence
of the edge pixels on the network is less than that of the pixels in the center, which
is not conducive to feature extraction. Therefore, the padding operation generally
adds several pixels to the peripheral edge of the input characteristic matrix so that
the sliding unit is a certain step from the initial position to the end position.
We do this to maintain the size of the characteristic image after the convolution
operation.
<New paragraph>The padding operation not only maintains the consistency of the input
dimension and output dimension of the convolution layer, but also maintains the boundary
information of the feature map, which improves the reference degree of the boundary
information to a certain extent. Therefore, based on this analysis, an activation
function is introduced to improve the boundary information of the oracle image, and
the equation is as follows:
The range of the sigmoid function is (0,1), which can map the output to the interval
(0,1). It is optimized and stable and can be used as the output layer.
Because the saturation gradient value of the sigmoid function is very small, the gradient
of the algorithm in back-propagation is transmitted to the front layer in a multiplicative
way<note: ambiguous>. Therefore, when the number of network layers is deep, the value
of gradient back-propagation to the front layer is very small, and the weight parameters
cannot be effectively updated in time, which makes iteration difficult and makes it
easy to cause gradient disappearance. The model also has difficulty converging, resulting
in training problems. Therefore, the linear rectification function is introduced to
correct this problem, and its calculation formula is:
Compared with sigmoid and tanh functions, the ReLU functioncan converge more quickly
in random gradient descent, and the expression of the ReLU function can be realized
more simply. The ReLU function provides the sparse expression ability to a neural
network and increases the sparsity of the network. To a certain extent, the greater
the sparsity, the more representative the features extracted by the network are, and
the faster the calculation speed is. However, with continuous training, a small part
of the input features will fall into the hard saturation area. As a result, the corresponding
weight parameters cannot be updated, which means the phenomenon of ``neuron death.''
LeakyReLU with a leakage linear rectifier improves the death characteristics of ReLU,
but at the same time, it loses some sparsity, as shown in the formula:
where x represents a constant, and $\nexists $ represents the leakage correction coefficient.
Finally, combining the characteristics, the maximum output is:
The maximum output function, MAXOUT, gives the following:
Finally, the designed oracle text image automatic classifier model is:
The oracle image feature data is input into the constructed deep learning automatic
classifier, the output is the final result of classification, and the final oracle
image automatic classification is completed.