1. Introduction
The chance of developing cancer is rising, even in younger age groups, as a result
of changes in human food patterns and an increase in environmental pollution. Accurate
diagnosis of the disease is crucial for patients since different tumors have distinct
hazards to the human body and require different therapies. The same patient may receive
different treatment from various doctors. The most accurate diagnosis is a pathological
one, as it is derived from microscopic examination of a patient sample that is still
alive. The transition from a visual image observed under a microscope to a series
of data points has been made possible by the advent of digital pathology scanners.
Although high-throughput detection is possible given existing pathology knowledge.
Doctors' experience and expertise nevertheless place limitations on digital storage
and multidimensional characterization. A study conducted by the United States revealed
that the misdiagnosis rate among doctors ranged from 15% to 45%. A statistical analysis
of hundreds of thousands of clinical cases revealed that the average misdiagnosis
rate of doctors in China is 33%. It is evident that the current medical diagnostic
process is rife with significant shortcomings. Additionally, there is a major talent
gap in pathology diagnosis right now, which places a heavy strain on hospitals [1-3]. The issue of a lack of pathological diagnostic skills can be efficiently solved
by letting AI learn from doctors' diagnostic experience. Pathological image analysis
includes issues with image segmentation, image classification, and image retrieval.
Common algorithms for analyzing pathology images include several kinds of feature
extraction algorithms, sparse representation models, and bag-of-words models, among
others. However, these algorithms' performance is limited due to the significant diversity
of human tissues [4,5]. Following the advent of deep learning, the processes of decentralized feature extraction
and feature classification have been integrated into an end-to-end trainable model,
which has led to a significant enhancement in the recognition accuracy of image classification
models. The end-to-end neural network training framework of deep learning is often
referred to as a "black box" by many people, as it is generally difficult to provide
a reasonable interpretation of the predictions given by deep learning. Nevertheless,
the interpretability of DL predictions is of paramount importance in the field of
medical imaging, and thus interpreting such predictions can be challenging. As a result,
one of the most active areas of research in the field of medical imaging is the current
interpretability for DL. The majority of the interpretable algorithms (IA) that are
currently in use are based on the attention mechanism (ATT) and a framework for producing
text from graphs, although most of them have the problem of having low accuracy. Therefore,
the study suggests a cell nucleus segmentation (CNS) model based on multi-task cascaded
convolutional network (MTCNN) and a CNS model based on attention mechanism-deep convolutional
residual network (ATDCRN) based medical imaging IA in order to increase the accuracy
and interpretability of pathological diagnosis. This CNS model employs MTCNN to complete
four cascaded subtasks: cell nucleus foreground extraction, cell nucleus foreground
denoising, cell nucleus foreground distance transformation, and cell nucleus edge
extraction. All subtasks are integrated into an end-to-end trainable and optimized
model, which is capable of effectively addressing a multitude of challenges, including
severe cell nucleus overlap, uneven internal grayscale, complex background noise artifacts,
and staining differences. The IA for medical images, which combines the ATT research
in the field of natural image analysis, endows diagnostic conclusions with a certain
degree of interpretability. In other words, while the model generates diagnostic conclusions,
a highlighted area in the original image is used to represent the image area that
the model is concerned about, in order to effectively utilize the semantic information
in pathological diagnostic reports.
The article is divided into five sections. The first section is the introduction,
and the second section is the literature review. This section will briefly describe
the research status of intelligent medical image analysis and deep learning. The third
section is the research on intelligent medical image analysis algorithms. This section
will study the CNS model based on MTSNN and the IA based on ATDCRN. The fourth section
will analyze the experimental results of the two algorithms. The fifth section will
summarize the research of the entire article.
2. Related Works
With the development of computer vision technology, intelligent medical image processing
methods have been widely used in pathological diagnosis, and intelligent analysis
algorithms for medical images have become a research hotspot. Zeng and his team proposed
a segmentation strategy based on WACLSF to address the problems of speckle noise,
low contrast, and blurred boundaries in ultrasound segmentation of breasts. The strategy
reduced speckle noise by an anisotropic diffusion filter and uses WACLS to extract
the tumour boundary. Experimental results showed that this strategy has significant
improvement in visual effect and accuracy [6]. Song and others proposed a machine learning based image analysis algorithm for the
problem of how to detect fluids in diabetic macular oedema and retinal vein occlusion
optical coherence tomography images [7]. For the challenges of diagnosing stomach cancer and determining the depth of gastric
cancer infiltration, Xie et al. developed a CNN-based image processing method. The
tests conducted showed that the method has a high accuracy for diagnosing gastric
cancer and detecting the level of infiltration. There was no noticeable difference
between endoscopists and this method in diagnosing gastric cancer [8]. Ruberto and other scholars proposed a DL-based image analysis algorithm for the
problem of recognizing leukocytes in blood. The algorithm can accurately identify
leukocytes in microscopic blood images and determine whether they suffer from leukaemia.
The algorithm was tested to have a leukocyte detection accuracy of 99.7% and a leukaemia
classification accuracy of 94.1% [9]. Chen et al. proposed a DL framework based on convolutional autoencoder for the problem
of feature learning in image analysis of lung nodules. The framework can support unsupervised
image feature learning of lung nodules with unlabeled data. The convolutional autoencoder-based
DL framework was tested to have a significant improvement in feature learning speed
[10].
DL is officially used in various fields due to its advantages. Kota and Munisamy proposed
a sentiment analysis algorithm based on CNN, Bi-LSTM, and ATT for the problem of sentiment
analysis of web text. Among them, CNN effectively reduced the complexity of the algorithm,
while Bi-LSTM can process long input sequence text [11]. Jiang and his team proposed an image segmentation framework based on weakly supervised
learning for the problem of earth image segmentation. The framework can fully combine
the geometric attributes of label position error into the vector representation. The
classification accuracy of the weakly supervised learning based image segmentation
framework is tested to be better than the rest of the algorithms [12]. Ding and Zheng proposed a serial binary image extraction method for the problem
of gesture language recognition. The method captured gesture depth images by Kinect
composite sensor device and uses VGG-CNN to evaluate the gesture depth image recognition
effect [13]. Zhang and his team proposed a multi-view DL-based fault detection method for the
fault identification problem of high-speed railway contact networks, which extracts
features from fused features by tensor decomposition. The method was tested to reduce
the average missed detection probability by at least 37.83% and improve the average
detection accuracy by at least 3.6% [14]. For the issue of determining the depth of flooding, Nair et al. suggested a detection
approach based on DL and fuzzy logic. The method determined the depth of flooding
by examining crowdsourced photos. According to the experimental findings, this method's
prediction accuracy can reach 83.1% [15].
In conclusion, the research into intelligent medical image analysis algorithms has
been highly effective, but it is merely an auxiliary tool and does not provide diagnostic
conclusions. Consequently, there is still a considerable reliance on doctors for pathological
diagnosis. Moreover, although DL can give processing conclusions, its difficult to
interpret the conclusions. Therefore, the study proposes an IA based on ATDCRN, while
a segmentation algorithm based on MTCNN is proposed for the CNS problem in order to
realize the diagnosis of intelligent pathology.
3. Medical Intelligent Image Analysis Algorithm Based on MTCNN and ATT
The digitization of pathology pictures is advancing along with information technology.
However, due to the fact that pathology diagnosis is highly dependent on doctors'
experience and is affected by the lack of pathology diagnostic talents and the limitation
of specialization. Therefore, the study suggests a medical image analysis algorithm
based on MTCNN and ATDCRN to enhance the speed and accuracy of pathology diagnosis
and ease the load on medical institutions.
3.1. MTCNN-based CNS Algorithm for Pathology
In the field of medical imaging, cell segmentation can provide an effective reference
for the diagnosis of numerous diseases. However, due to the differences in the size
and morphology of the cell nucleus itself, etc., as well as the problems of uneven
staining and high background noise, it is more difficult to achieve accurate CNS.
To address the above problems the research proposed a CNS algorithm based on MTCNN.
MTCNN consists of three networks, each of which can be used independently with cascade
detection characteristics. The working process of MTCNN is mainly divided into three
steps, i.e., Classification, Boundary value regression and Landmark localization.
The classification equation is shown in Eq. (1).
In Eq. (1), $L_{i}^{\det } $ denotes the cross-entropy loss function for cell nucleus classification.
$p_{i} $ denotes the probability that the region is a cell nucleus. $y_{i}^{\det }
$ denotes the true labeling of the region as background. The equation for boundary
value regression is shown in Eq. (2).
In Eq. (2), $\hat{y}_{i}^{box} $ denotes the predicted boundary value of the network. $y_{i}^{box}
$ denotes the background coordinate value. The equation for Landmark localization
is given in Eq. (3).
In Eq. (3), $\hat{y}_{i}^{bolandmark} $ denotes the result predicted by the network model. $y_{i}^{bolandmark}
$ denotes the Landmark localization coordinates of the background. In order to reduce
the redundant coordinates and candidate frames, MTCNN employs a non-maximum suppression
algorithm to remove the redundant coordinates and candidate frames. Fig. 1 depicts the structure of the MTCNN model.
Fig. 1. Structure of the MTCNN model.
As illustrated in Fig. 1, the MTCNN model uses the softmax function to classify the picture by placing the
feature mappings in a convolutional layer corresponding to the number of categories.
In the meantime, each feature mapping is averaged using a global average pooling procedure
to remove the fully connected layers' ``black-box'' property [16,17]. The equation for calculating the output image size after convolution operation and
pooling is given in Eq. (4).
In Eq. (4), $W$ denotes the matrix width. $F$ denotes the size of the convolution kernel. $P$
denotes padding. $H$ denotes the matrix height. $S$ denotes the step size. Due to
the serious grey scale unevenness and suspected background inside the cell in practical
applications, the cell foreground extraction will cause problems such as holes and
artifacts. Therefore, the study proposes a noise reduction method based on sDCAE.
Meanwhile, in order to avoid the occurrence of overfitting phenomenon, the study introduces
the $L2$ paradigm, and the loss function of the $L2$ paradigm is shown in Eq. (5).
In Eq. (5), $M$ denotes the weight matrix. $m$ denotes the dimension of the feature vector.
$w$ and $b$ denote the different vectors. $\hat{y}^{(i)} $ and $y^{(i)} $B denote
the estimated and target values, respectively. $\lambda $ denotes the regularization
parameter. The model of CNS algorithm based on MTCNN is shown in Fig. 2.
Fig. 2. Model of cell nuclear segmentation algorithm based on MTCNN.
In Fig. 2, the CNS algorithm based on MTCNN divides the CNS task into four steps, i.e., cell
nucleus foreground extraction, foreground noise reduction, intra-foreground distance
transformation and cell nucleus edge extraction. The cell nucleus foreground extraction
network preliminarily segmented the original image. Then the segmented image was noise
reduced, reconstructed and segmented by the cell foreground noise reduction network.
Then the original image, the preliminary segmented image and the noise reduction and
reconstruction segmentation map are input into the cell foreground distance transformation
network to perform the intra-foreground distance change and get the distance transformation
map. Finally, the distance transformation map is passed into the cell edge learning
network to perform the cell nucleus edge extraction, and the cell nucleus edge segmentation
results can be obtained. The total loss function of the network is shown in Eq. (6).
In Eq. (6), $L$ denotes the total loss function. $L_{1} $, $L_{2} $, $L_{3} $ and $L_{4} $ denote
the loss functions of the cytosolic foreground extraction network, cytosolic foreground
noise reduction network, cytosolic foreground distance transformation network and
cytosolic edge learning network, respectively. $stage1$, $stage2$, $stage3$ and $stage4$
denote the four steps of cytosolic foreground extraction, foreground noise reduction,
in-foreground distance transformation and cytosolic edge extraction, respectively.
3.2. Interpretable Diagnostic Algorithms Based on DL
For a machine to be truly capable of assisting a doctor in diagnosis, the output of
the machine must be understood by the doctor, i.e., be able to accurately describe
the image features in natural language. RNN can achieve a natural understanding of
the data, and has a long-range memory and the ability to handle long data changes.
The structure of RNN is shown in Fig. 3.
In Fig. 3, RNN adopts a cyclic structure, which allows information to be transferred between
neurons within layers. However, in the actual training, because the RNN is through
the "overwriting" way of class computing state, resulting in easy to appear gradient
explosion or gradient dispersion problem [18,19]. The gradient explosion can be solved by gradient trimming, but it is more difficult
to solve the gradient dispersion. Additionally, the LSTM in recurrent neural networks
may successfully address the issue of gradient dispersion. Fig. 4 depicts the LSTM structure.
Fig. 3. Structure of RNN.
Fig. 4. Structure of LSTM.
In Fig. 4, the LSTM calculates the input gate, forgetting gate, ready memory unit, memory unit
and output gate. Then performs the combinatorial operation to get the current implicit
layer variables to determine how much information is memorized and how much information
is forgotten and released [20,21]. By this way, the occurrence of gradient dispersion can be effectively avoided, and
the sequential data memory ability of RNN can be better utilized. Equation (7) displays the formulae for the input gate, forgetting gate, and output gate.
In Eq. (7), $i_{t} $, $f_{t} $, and $o_{t} $ denote the input, forgetting and output gates,
respectively. $W^{(i)} $, $U^{(i)} $, $W^{(f)} $, $U^{(f)} $, $W^{(o)} $, and $U^{(o)}
$ denote the learning parameters of the input, forgetting and output gates, respectively.
$x_{t} $ denotes the input at the moment of $t$. and $h_{t-1} $ denotes the implicit
layer variable at the moment of $t-1$. The equations for the calculation of the preparatory
memory units, memory units and implicit layer variables are shown in Eq. (8).
In Eq. (8), $\tilde{c}_{t} $ and $c_{t} $ denote the preparatory memory unit and the memory
unit, respectively. $W^{(c)} $ and $U^{(c)} $ both denote the learning parameters
of the preparatory memory unit. If image processing is to be interpretable, it is
necessary to describe the picture features through natural language, but there are
fewer IA for the medical imaging field, and the diagnostic reports generated often
contain a large amount of redundant information. Therefore, the study proposes IA-ATDCRN
based on deep convolutional residual network and ATT. The schematic model of ATDCRN
algorithm is shown in Fig. 5.
Fig. 5. Schematic diagram of ATDCRN algorithm model.
The model is broken down into three sections, namely feature extraction, ATT, and
diagnostic prediction, as shown in Fig. 5. The model first performs feature extraction of the original image by deep convolutional
residual network, then selects the effective region in the image by ATT, and finally
integrates the feature attributes and makes diagnosis by LSTM. First, the deep convolution
residual network will extract the lesion attribute properties of cells. Then, ATT
will select the effective areas in the image, thereby effectively utilizing the semantic
information of different attributes. This process ultimately leads to more accurate
diagnosis and prediction. In particular, the attention matrix can be derived by extracting
the deep convolution residual network phrase and then weighted averaging the convolution
feature tensor, which represents the attention weight region. Normalizing and visualizing
the attention matrix on the original input image provides a useful interpretation
of the model prediction. The weighted feature calculation equation is shown in Eq.
(9).
In Eq. (9), $f_{k} $ denotes the weighted feature of the $k$th ATT module. $Att_{k} $ denotes
the convolutional feature basis of the $k$th ATT module. $attfeats_{k} $ denotes the
convolutional feature tensor of the corresponding module. Through ATT, the effective
use and trade-off of residual convolutional features is achieved, which provides strong
support for diagnostic prediction. The calculation equation for the input of the diagnostic
prediction module is shown in Eq. (10).
In Eq. (10), $x_{k} $ denotes the input of step $k$. $f_{k} $ denotes the selective features
of step $k$. $W^{(x)} $ denotes the learning parameters. The equation for the nonlinear
mapping of the hidden layer variables to the four-dimensional space is given in Eq.
(11).
In Eq. (11), $z_{k} $ denotes the hidden layer variables after nonlinear transformation. $s_{k}
$ denotes the probability vector. In the training process, the first stage is labeled
with the basic attribute semantics, and the loss function at this time is shown in
Eq. (12).
In Eq. (12), $l_{1c} $ denotes the prediction loss function of the residual network. $l_{1k}
$ denotes the loss function of the $k$th ATT module. $SCE$ denotes the nonlinear activation
cross-entropy loss function of the sigmoid function. $L_{1} $ denotes the loss function
of the first stage. $\theta _{R} $ and $\theta _{A} $ denote the learning parameters
of the residual network and the ATT module, respectively. The second stage will be
trained for diagnostic result prediction, and the loss function for this stage is
shown in Eq. (13).
In Eq. (13), $l_{2k} $ the loss function of the $k$th ATT module. $s_{k} $ denotes the $k$th
prediction vector. $\theta _{C} $ denotes the learning parameters of the diagnostic
prediction module. $L_{2} $ denotes the loss function of the second section stage.
At this point, the total loss function is shown in Eq. (14).
In Eq. (14), $\alpha $ and $\beta $ denote the loss weights of the first and second stages, respectively.
The computational schematic of ATDCRN is shown in Fig. 6.
Fig. 6. Calculation diagram of ATDCRN.
In Fig. 6, the linear transformation matrix passes through the selected band-column vectors.
The whole algorithm updates the parameters through the back-propagation algorithm.
Moreover, the prediction vectors obtained from the ATT module are computed through
the LSTM, which can obtain the final prediction results. The gap in Fig. 6 is a global average pooling, so a given convolutional feature tensor is converted
by the gap into smaller feature tensors, and it can be converted into a prediction
vector of semantic categories after a simple linear transformation. It is worth noting
that there are four ATTs in ATDCRN, because the study constructed the network using
cervical precancerous lesions, which have four attribute features. The prediction
conclusion calculation equation is shown in Eq. (15).
In Eq. (15), $s_{f} $ denotes the final prediction result. $s_{0} $, $s_{1} $, $s_{2} $, $s_{3}
$, and $s_{4} $ denote the prediction vectors obtained from each prediction module,
respectively.
4. Experimental Results and Analysis
The study runs separate tests on the MTCNN-based CNS algorithm and the ATDCRN interpretable
model in order to validate their individual performance. Among them, the MTCNN-based
CNS algorithm will be tested on the public dataset of the CNS competition and compared
with the DCAN model and the Jbarker model. The cervical precancer dataset, which contains
1000 images, will be utilized for testing the ATDCRN model, with 200 of those images
serving as the testing set. The experiments will also be compared to the performance
of the three algorithms, ATDCRN, AlexNet, and ResNet. The experimental parameter settings
for MTCNN and ATDCRN are shown in Table 1.
Table 1. Setting of the experimental parameters.
Network
|
Parameter
|
Value
|
MTCNN[22]
|
Enter the image channel
|
3
|
Patch
|
31*31
|
Cascade feature graph channel
|
5
|
outgoing channel
|
1
|
Convolution kernel
|
3*3
|
ATDCRN[23]
|
Convolution kernel
|
3*3
|
Batchsize
|
64
|
Learning rate
|
0.001
|
stride
|
2
|
Padding
|
3
|
Table 1 indicates that the number of input image channels in MTCNN is 3, the number of output
image channels is 1, the patch size is 31$\mathrm{\ast}$31, the size of the convolution
kernel is 3$\mathrm{\ast}$3, and the number of channels in the cascade feature map
is 5. In contrast, the convolution kernel size of ATDCRN is 3$\mathrm{\ast}$3, the
batch size is 64, the learning rate is 0.001, and the stride and padding are 2 and
3, respectively. The F1-score and IoU of MTCNN, DCAN and Jbarker are shown in Fig. 7.
Fig. 7. F1-score and IoU of MTCNN, DCAN, and Jbarker.
In Fig. 7(a), the F1-scores of all three models increase with the number of iterations, with DCAN
having F1-scores of about 0.74 and 0.84 at 100 and 600 iterations, respectively, and
an average F1-score of about 0.8. Jbarker having F1-scores at 100 and 600 iterations
are about 0.72 and 0.81, respectively, with an average F1-score of about 0.77. While
MTCNN has an F1-score of about 0.78 and 0.87, respectively, with an average F1-score
of about 0.84 at 100 and 600 iterations. From Fig. 7(b), the highest and lowest IoUs of DCAN are about 0.84 and 0.8, respectively, and the
average IoU is about 0.82. The highest and lowest IoUs of Jbarker are about 0.83 and
0.78, respectively, and the average IoU is about 0.81. The highest and lowest IoUs
of MTCNN are about 0.86 and 0.82, respectively, and the average IoU is about 0.85.
It can be seen that the segmentation results of MTCNN have higher accuracy and better
overall performance. The superior performance of MTCNN relative to other models can
be attributed to its incorporation of intermediate learning processes, including cell
foreground denoising and distance transformation. The MTCNN model employs a multi-task
sequence learning method, which renders it more sensitive to severely overlapping
and fuzzy kernel edges. The D1, D2 and scores of the three models are shown in Fig. 8.
Fig. 8. DICE1 and DICE2 of three models.
From Fig. 8(a), the highest and lowest DICE1 of DCAN is about 0.87 and 0.83, and the average DICE1
is about 0.85. The highest and lowest DICE1 of Jbarker is about 0.85 and 0.79, respectively,
and the average DICE1 is about 0.83. The highest and lowest DICE1 of MTCNN is about
0.91 and 0.85, respectively. Average DICE1 is about 0.88. From Fig. 8(b), the lowest and highest DICE2 of DCAN are about 0.68 and 0.74, respectively, and
the average DICE2 is about 0.72. The lowest and highest DICE2 of Jbarker are about
0.64 and 0.72, respectively, and the average DICE2 is about 0.68. The lowest and highest
DICE2 of MTCNN are are about 0.72 and 0.79, respectively, with an average DICE2 of
about 0.75. It can be concluded that the CNS performance of MTCNN is better. The superiority
of DICE1 and DICE2 of MTCNN to other models can be attributed to the introduction
of cell foreground extraction and distance transformation in MTCNN. This further indicates
that the learning process of cell foreground distance transformation is highly sensitive
to severe overlapping cell edges. The AJI and mAp of the three models are shown in
Fig. 9.
Fig. 9. AJI and mAp for three different models.
In Fig. 9(a), the AJI of DCAN has a minimum of about 0.72 and a maximum of about 0.78, with an
average of about 0.75. The AJI of Jbarker has a minimum and a maximum of about 0.68
and 0.75, respectively, with an average AJI of about 0.71. The AJI of MTCNN has a
minimum of about 0.76 and a maximum of about 0.85, with an average of about 0.81.
From Fig. 9(b), the lowest and highest mAP of DCAN are about 0.72 and 0.79, respectively, and the
average mAp is about 0.76. The lowest mAP of Jbarker is about 0.69, the highest is
about 0.74, and the average is about 0.72. The lowest mAP of MTCNN is about 0.78,
the highest is about 0.82, and the average is about 0.80. Among them, the MTCNN has
higher AJI and mAp than the remaining two algorithms. The diagnostic conclusion prediction
accuracy and recall of ATDCRN, AlexNet and ResNet are shown in Fig. 10.
Fig. 10. Prediction accuracy and recall rate of diagnostic conclusions for ATDCRN,
AlexNet, and ResNet.
In Fig. 10(a), the highest diagnostic conclusion prediction accuracy of AlexNet is about 73.1%,
the lowest is about 69.7%, and the average accuracy is about 71.9%. The highest accuracy
of ResNet is about 80.2%, the lowest is about 76.9%, and the average accuracy is about
78.3%. The highest accuracy of ATDCRN is about 86.6%, and the lowest is 83.8% or so,
with an average accuracy of about 85.2%. In Fig. 10(b), the highest diagnostic conclusion prediction recall rate of AlexNet is about 83.3%,
the lowest is about 80.3%, and the average recall rate is about 81.9%. The highest
recall rate of ResNet is about 87.8%, the lowest is about 85.2%, and the average recall
rate is about 86.2%. The highest recall rate of ATDCRN is about 90.3%, the lowest
is 88.5% or so, with an average recall rate of about 89.3%. It can be concluded that
ATDCRN can effectively improve the accuracy and recall of diagnostic conclusion prediction.
The superior performance of ATDCRN compared to other models is attributable to the
introduction of four ATTs, each corresponding to a specific attribute feature. These
mechanisms are capable of effectively utilizing semantic information pertaining to
different attributes. The semantic attribute prediction accuracy and recall of ATDCRN,
AlexNet and ResNet are shown in Fig. 11.
Fig. 11. Semantic attribute prediction accuracy and recall rate of ATDCRN, AlexNet,
and ResNet.
In Fig. 11(a), the highest semantic attribute prediction accuracy of AlexNet is about 74.2%, the
lowest is about 68.7%, and the average accuracy is about 72.2%. The highest accuracy
of ResNet is about 78.3%, the lowest is about 71.6%, and the average accuracy is about
74.8%. The highest accuracy of ATDCRN is about 78.7%, and the lowest is 73.2% or so,
with an average accuracy of about 76.5%. From Fig. 11(b), AlexNet has the highest semantic attribute prediction recall of about 76.1%, the
lowest of about 72.3%, and the average recall of about 74.6%. ResNet has the highest
recall of about 81.2%, the lowest of about 73.7%, and the average recall of about
77.0%. ATDCRN has the highest recall of about 81.6%, the lowest of about 77.2%, with
an average recall rate of about 79.9%. The above results show that the performance
of ATDCRN for semantic attribute prediction is excellent compared to the rest of the
algorithms. The study conducted extensive experiments to contrast the proposed ATDCRN
algorithm with CNN-RNN and Mask-CNN in order to further validate its performance.
Consequently, the utilization of ATDCRN for auxiliary medical diagnosis can effectively
reduce the misdiagnosis rate to less than 20%, thereby significantly improving the
current medical diagnosis problems. The diagnostic conclusions and semantic attribute
prediction accuracy of ATDCRN, CNN-RNN and Mask-CNN are shown in Fig. 12.
Fig. 12. Diagnostic conclusion and semantic attribute prediction accuracy of ATDCRN,
CNN-RNN, and mask CNN.
Fig. 12(a) illustrates that the diagnostic conclusion prediction accuracy of CNN-RNN is approximately
78.3% at the highest and 76.8% at the lowest, with an average accuracy of about 77.5%.
Mask-CNN is approximately 81.3% at the highest and 79.8% at the lowest, with an average
accuracy of approximately 80.7%. ATDCRN is approximately 87.2%, the lowest is approximately
85.4%, and the average accuracy is approximately 86.1%. As can be seen in Fig. 12(b), the highest semantic attribute prediction accuracy rate of CNN-RNN is about 72.4%,
the lowest is about 69.6%, and the average accuracy rate is about 70.9%. The highest
semantic attribute accuracy rate of Mask-CNN is about 74.6%, the lowest is about 73.1%,
and the average accuracy rate is about 74%. The highest semantic attribute accuracy
rate of ATDCRN is about 80.3% and the lowest is about 78.7%, with an average accuracy
rate of about 79.5%.
5. Conclusion
Medical image processing, a branch of computer vision, has advanced quickly alongside
computer technology and artificial intelligence. The study covers the issues of pathological
CNS and pathological image interpretability, and it suggests and tests separately
a CNS algorithm based on MTCNN and an IA method based on ATDCRN. According to the
experimental findings, the F1-score of MTCNN is approximately 0.78 and 0.87 at 100
and 600 iterations, respectively, with an average F1-score of about 0.84. The IoU
is the highest at approximately 0.86, the lowest at approximately 0.82, and the average
is at approximately 0.85. All of them are higher than those of the DCAN and Jbarker
models. The highest DICE1 values are 0.87, 0.85, and 0.91 for the DCAN, Jbarker, and
MTCNN, respectively. The average DICE1 value is 0.85, 0.83, and 0.88. The highest
DICE2 values are 0.74, 0.72, and 0.79, with the average being 0.72, 0.68, and 0.75.
MTCNN has the highest DICE1 and DICE2 values. The average mAP and AJI of MTCNN are
about 0.80 and 0.81 respectively, which are still higher than those of DCAN and Jbarker
models. The average diagnostic conclusion prediction accuracies of ATDCRN, AlexNet
and ResNet are 85.2%, 71.9% and 78.3% respectively, and the average diagnostic conclusion
prediction recall rates are 89.3%, 81.9% and about 85.2%. The average semantic attribute
prediction accuracy is about 76.5%, 72.2%, and 74.8%, respectively. The average semantic
attribute prediction recall is about 79.9%, 74.6%, and 77.0%, respectively. ATDCRN
has the highest prediction accuracy and recall for both diagnostic conclusions and
semantic attributes. The aforementioned findings demonstrated that while the IA based
on ATDCRN permits accurate interpretation of medical pictures, the CNS model based
on MTCNN efficiently enhances CNS accuracy. The study does not account for the presence
of a significant number of samples, which raises the risk of the proposed algorithm
being less effective and accurate when faced with a lot of data.
REFERENCES
R. Suzuki, N. Yajima, K. Sakurai, N. Oguro, T. Wakita, and D. H. Thom et al., ``Association
of patients' past misdiagnosis experiences with trust in their current physician among
Japanese adults,'' Journal of General Internal Medicine, vol. 37, no. 5, pp. 1115-1121,
2022.

S. Diao, J. Hou, H. Yu, X. Zhao, and W. Luo, ``Computer-aided pathologic diagnosis
of nasopharyngeal carcinoma based on deep learning,'' American Journal of Pathology,
vol. 190, no. 8, pp. 1691-1700, 2020.

F. Masood, J. Masood, H. Zahir, K. Driss, N. Mehmood, and H. Farooq, ``Novel approach
to evaluate classification algorithms and feature selection filter algorithms using
medical data,'' Journal of Computational and Cognitive Engineering, vol. 2, no. 1,
pp. 57-67, 2023.

A. Al-Saffar, A. Zamani, A. Stancombe, and A. Abbosh, ``Operational learning-based
boundary estimation in electromagnetic medical imaging,'' IEEE Transactions on Antennas
and Propagation, vol. 70, no. 3, pp. 2234-2245, 2022.

L. Alzubaidi, M. A. Fadhel, O. Al-Shamma, J. Zhang, J. Santamaria, and Y. Duan, ``Robust
application of new deep learning tools: An experimental study in medical imaging,''
Multimedia Tools and Applications, vol. 81, no. 10, pp. 113289-113317, 2022.

T. Zeng, D. Kong, J. Zhang, and Q. Ma, ``Weighted area constraints-based breast lesion
segmentation in ultrasound image analysis,'' Inverse Problems and Imaging, vol. 16,
no. 2, pp. 451-466, 2022.

W. Song, A. H. Kaakour, A. Kalur, J. C. Muste, A. I. Iyer, and C. C. S. Valentim et
al., ``Performance of a machine-learning computational image analysis algorithm in
retinal fluid quantification for patients with diabetic macular edema and retinal
vein occlusions,'' Ophthalmic Surgery, Lasers & Imaging Retina, vol. 53, no. 3, pp.
123-131, 2022.

F. Xie, K. Zhang, F. Li, G. Ma, Y. Ni, and W. Zhang et al., ``Diagnostic accuracy
of convolutional neural network-based endoscopic image analysis in diagnosing gastric
cancer and predicting its invasion depth: a systematic review and meta-analysis,''
Gastrointestinal Endoscopy, vol. 95, no. 4, pp. 599-609, 2022.

C. D. Ruberto, A. Loddo, and G. Puglisi, ``Blob detection and deep learning for leukemic
blood image analysis,'' Applied Sciences, vol. 10, no. 3, pp. 1176-1188, 2020.

M. Chen, X. Shi, Y. Zhang, D. Wu, and M. Guizani, ``Deep feature learning for medical
image analysis with convolutional autoencoder neural network,'' IEEE Transactions
on Big Data, vol. 7, no. 4, pp. 750-758, 2021.

V. R. Kota and S. D. Munisamy, ``High accuracy offering attention mechanisms based
deep learning approach using CNN/bi-LSTM for sentiment analysis,'' International Journal
of Intelligent Computing and Cybernetics, vol. 15, no. 1, pp. 61-74, 2022.

Z. Jiang, W. He, K. M. Stephen, S. A. Man, S. Wang, and V. Stanislawski et al., ``Weakly
supervised spatial deep learning for earth image segmentation based on imperfect polyline
labels,'' ACM Transactions on Intelligent Systems and Technology (TIST), vol. 13,
no. 2, pp. 169-188, 2022.

I. J. Ding and N. W. Zheng, ``RGB-D depth-sensor-based hand gesture recognition using
deep learning of depth images with shadow effect removal for smart gesture communication,''
Sensors and Materials, vol. 34, no. 1, pp. 203-216, 2022.

X. Zhang, Y. Gong, C. Qiao, and W. Jing, ``Multiview deep learning based on tensor
decomposition and its application in fault detection of overhead contact systems,''
The Visual Computer, vol. 38, no. 4, pp. 1457-1467, 2022.

B. B. Nair, S. Krishnamoorthy, M. Geetha, and S. N. Rao, ``Machine vision based flood
monitoring system using deep learning techniques and fuzzy logic on crowdsourced image
data,'' Intelligent Decision Technologies, vol. 15, no. 3, pp. 357-370, 2021.

X. B. Yang and W. Zhang, ``Heterogeneous face detection based on multi-task cascaded
convolutional neural network,'' IET Image Processing, vol. 16, no. 1, pp. 207-215,
2022.

X. Wu, P. Li, J. Zhou, and Y. Liu, ``A cascaded CNN-based method for monocular vision
robotic grasping,'' Industrial Robot, vol. 49, no. 4, pp. 645-675, 2022.

Y. Yang and X. Song, ``Research on face intelligent perception technology integrating
deep learning under different illumination intensities,'' Journal of Computational
and Cognitive Engineering, vol. 1, no. 1, pp. 32-36, 2022.

B. Xing, E. Xu, J. Wei, and Y. Meng, ``Recurrent neural network non-singular terminal
sliding mode control for path following of autonomous ground vehicles with parametric
uncertainties,'' IET Intelligent Transport Systems, vol. 16, no. 5, pp. 616-629, 2022.

Y. Zhang, S. Wang, G. Sun, and J. Mao, ``Aerodynamic surrogate model based on deep
long short-term memory network: An application on high-lift device control,'' Proceedings
of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering,
vol. 236, no. 6, pp. 1081-1097, 2022.

J. Yao, B. Li, and J. Zhao, ``Tool remaining useful life prediction using deep transfer
reinforcement learning based on long short-term memory networks,'' The International
Journal of Advanced Manufacturing Technology, vol. 118, no. 3, pp. 1077-1080, 2022.

J. Liu, ``Research on video image face detection and recognition technology based
on improved MTCNN algorithm,'' International Journal of Wireless and Mobile Computing,
vol. 22, no. 3, pp. 205-212, 2022.

V. K. Vatsavayi and N. Andavarapu, ``Identification and classification of wild animals
from video sequences using hybrid deep residual convolutional neural network,'' Multimedia
Tools and Applications, vol. 81, no. 23, pp. 33335-33360, 2022.

Author
Junye Yang graduated from Shenyang University of Chemical Technology in March 2010
with her master's degree in computer software and theory. She is currently working
in Shijiazhuang Institute of Technology. Her main research direction includes computer
technology, information security and artificial intelligence. She is the chief editor
of 2 textbooks. She has published more than 20 academic articles, including 2 Chinese
core articles, 1 SCI articles, and participated in 7 scientific research projects.
Yujuan Du obtained her master's degree in oncology from Hebei Medical University
in 2012. She is working in the Department of Geriatrics at Chuiyangliu Hospital affiliated
to Tsinghua University. Her areas of interest include geriatric oncology, geriatric
medicine, cancer palliative and hospice care.
Fang Liu graduated from Shanxi University in July 2010 with her master's degree
in computer application technology. She is currently working in Shijiazhuang Institute
of Technology. She is acting as the director of the Internet Application Technology
College. Her main research direction is software technology development and artificial
intelligence. She has published more than 20 academic articles, including 2 Chinese
core articles, 3 SCI articles, presided over and participated in 4 scientific research
projects at the provincial level, she is in chief of 4 textbooks.