Mobile QR Code QR CODE

2024

Acceptance Ratio

21%


  1. (College of Internet Application Technology, Shijiazhuang Institute of Technology, Shijiazhuang 050228, China)
  2. (Department of Geriatrics, Beijing Chuiyangliu Hospital, Beijing 100022, China)



Medical imaging, Cell nucleus segmentation, Deep learning, Interpretable algorithms

1. Introduction

The chance of developing cancer is rising, even in younger age groups, as a result of changes in human food patterns and an increase in environmental pollution. Accurate diagnosis of the disease is crucial for patients since different tumors have distinct hazards to the human body and require different therapies. The same patient may receive different treatment from various doctors. The most accurate diagnosis is a pathological one, as it is derived from microscopic examination of a patient sample that is still alive. The transition from a visual image observed under a microscope to a series of data points has been made possible by the advent of digital pathology scanners. Although high-throughput detection is possible given existing pathology knowledge. Doctors' experience and expertise nevertheless place limitations on digital storage and multidimensional characterization. A study conducted by the United States revealed that the misdiagnosis rate among doctors ranged from 15% to 45%. A statistical analysis of hundreds of thousands of clinical cases revealed that the average misdiagnosis rate of doctors in China is 33%. It is evident that the current medical diagnostic process is rife with significant shortcomings. Additionally, there is a major talent gap in pathology diagnosis right now, which places a heavy strain on hospitals [1-3]. The issue of a lack of pathological diagnostic skills can be efficiently solved by letting AI learn from doctors' diagnostic experience. Pathological image analysis includes issues with image segmentation, image classification, and image retrieval. Common algorithms for analyzing pathology images include several kinds of feature extraction algorithms, sparse representation models, and bag-of-words models, among others. However, these algorithms' performance is limited due to the significant diversity of human tissues [4,5]. Following the advent of deep learning, the processes of decentralized feature extraction and feature classification have been integrated into an end-to-end trainable model, which has led to a significant enhancement in the recognition accuracy of image classification models. The end-to-end neural network training framework of deep learning is often referred to as a "black box" by many people, as it is generally difficult to provide a reasonable interpretation of the predictions given by deep learning. Nevertheless, the interpretability of DL predictions is of paramount importance in the field of medical imaging, and thus interpreting such predictions can be challenging. As a result, one of the most active areas of research in the field of medical imaging is the current interpretability for DL. The majority of the interpretable algorithms (IA) that are currently in use are based on the attention mechanism (ATT) and a framework for producing text from graphs, although most of them have the problem of having low accuracy. Therefore, the study suggests a cell nucleus segmentation (CNS) model based on multi-task cascaded convolutional network (MTCNN) and a CNS model based on attention mechanism-deep convolutional residual network (ATDCRN) based medical imaging IA in order to increase the accuracy and interpretability of pathological diagnosis. This CNS model employs MTCNN to complete four cascaded subtasks: cell nucleus foreground extraction, cell nucleus foreground denoising, cell nucleus foreground distance transformation, and cell nucleus edge extraction. All subtasks are integrated into an end-to-end trainable and optimized model, which is capable of effectively addressing a multitude of challenges, including severe cell nucleus overlap, uneven internal grayscale, complex background noise artifacts, and staining differences. The IA for medical images, which combines the ATT research in the field of natural image analysis, endows diagnostic conclusions with a certain degree of interpretability. In other words, while the model generates diagnostic conclusions, a highlighted area in the original image is used to represent the image area that the model is concerned about, in order to effectively utilize the semantic information in pathological diagnostic reports.

The article is divided into five sections. The first section is the introduction, and the second section is the literature review. This section will briefly describe the research status of intelligent medical image analysis and deep learning. The third section is the research on intelligent medical image analysis algorithms. This section will study the CNS model based on MTSNN and the IA based on ATDCRN. The fourth section will analyze the experimental results of the two algorithms. The fifth section will summarize the research of the entire article.

2. Related Works

With the development of computer vision technology, intelligent medical image processing methods have been widely used in pathological diagnosis, and intelligent analysis algorithms for medical images have become a research hotspot. Zeng and his team proposed a segmentation strategy based on WACLSF to address the problems of speckle noise, low contrast, and blurred boundaries in ultrasound segmentation of breasts. The strategy reduced speckle noise by an anisotropic diffusion filter and uses WACLS to extract the tumour boundary. Experimental results showed that this strategy has significant improvement in visual effect and accuracy [6]. Song and others proposed a machine learning based image analysis algorithm for the problem of how to detect fluids in diabetic macular oedema and retinal vein occlusion optical coherence tomography images [7]. For the challenges of diagnosing stomach cancer and determining the depth of gastric cancer infiltration, Xie et al. developed a CNN-based image processing method. The tests conducted showed that the method has a high accuracy for diagnosing gastric cancer and detecting the level of infiltration. There was no noticeable difference between endoscopists and this method in diagnosing gastric cancer [8]. Ruberto and other scholars proposed a DL-based image analysis algorithm for the problem of recognizing leukocytes in blood. The algorithm can accurately identify leukocytes in microscopic blood images and determine whether they suffer from leukaemia. The algorithm was tested to have a leukocyte detection accuracy of 99.7% and a leukaemia classification accuracy of 94.1% [9]. Chen et al. proposed a DL framework based on convolutional autoencoder for the problem of feature learning in image analysis of lung nodules. The framework can support unsupervised image feature learning of lung nodules with unlabeled data. The convolutional autoencoder-based DL framework was tested to have a significant improvement in feature learning speed [10].

DL is officially used in various fields due to its advantages. Kota and Munisamy proposed a sentiment analysis algorithm based on CNN, Bi-LSTM, and ATT for the problem of sentiment analysis of web text. Among them, CNN effectively reduced the complexity of the algorithm, while Bi-LSTM can process long input sequence text [11]. Jiang and his team proposed an image segmentation framework based on weakly supervised learning for the problem of earth image segmentation. The framework can fully combine the geometric attributes of label position error into the vector representation. The classification accuracy of the weakly supervised learning based image segmentation framework is tested to be better than the rest of the algorithms [12]. Ding and Zheng proposed a serial binary image extraction method for the problem of gesture language recognition. The method captured gesture depth images by Kinect composite sensor device and uses VGG-CNN to evaluate the gesture depth image recognition effect [13]. Zhang and his team proposed a multi-view DL-based fault detection method for the fault identification problem of high-speed railway contact networks, which extracts features from fused features by tensor decomposition. The method was tested to reduce the average missed detection probability by at least 37.83% and improve the average detection accuracy by at least 3.6% [14]. For the issue of determining the depth of flooding, Nair et al. suggested a detection approach based on DL and fuzzy logic. The method determined the depth of flooding by examining crowdsourced photos. According to the experimental findings, this method's prediction accuracy can reach 83.1% [15].

In conclusion, the research into intelligent medical image analysis algorithms has been highly effective, but it is merely an auxiliary tool and does not provide diagnostic conclusions. Consequently, there is still a considerable reliance on doctors for pathological diagnosis. Moreover, although DL can give processing conclusions, its difficult to interpret the conclusions. Therefore, the study proposes an IA based on ATDCRN, while a segmentation algorithm based on MTCNN is proposed for the CNS problem in order to realize the diagnosis of intelligent pathology.

3. Medical Intelligent Image Analysis Algorithm Based on MTCNN and ATT

The digitization of pathology pictures is advancing along with information technology. However, due to the fact that pathology diagnosis is highly dependent on doctors' experience and is affected by the lack of pathology diagnostic talents and the limitation of specialization. Therefore, the study suggests a medical image analysis algorithm based on MTCNN and ATDCRN to enhance the speed and accuracy of pathology diagnosis and ease the load on medical institutions.

3.1. MTCNN-based CNS Algorithm for Pathology

In the field of medical imaging, cell segmentation can provide an effective reference for the diagnosis of numerous diseases. However, due to the differences in the size and morphology of the cell nucleus itself, etc., as well as the problems of uneven staining and high background noise, it is more difficult to achieve accurate CNS. To address the above problems the research proposed a CNS algorithm based on MTCNN. MTCNN consists of three networks, each of which can be used independently with cascade detection characteristics. The working process of MTCNN is mainly divided into three steps, i.e., Classification, Boundary value regression and Landmark localization. The classification equation is shown in Eq. (1).

(1)
$ L_{i}^{\det } =-\left(\left(1-y_{i}^{\det } \right)\left(1-\log \left(p_{i} \right)\right)+y_{i}^{\det } \log \left(p_{i} \right)\right),\nonumber\\ \quad y_{i}^{\det } \in \left\{0,~1\right\} . $

In Eq. (1), $L_{i}^{\det } $ denotes the cross-entropy loss function for cell nucleus classification. $p_{i} $ denotes the probability that the region is a cell nucleus. $y_{i}^{\det } $ denotes the true labeling of the region as background. The equation for boundary value regression is shown in Eq. (2).

(2)
$ L_{i}^{box} =\left\| \hat{y}_{i}^{box} -y_{i}^{box} \right\| _{2}^{2} ,~\hat{y}_{i}^{box} \in {\mathbb R}^{4}. $

In Eq. (2), $\hat{y}_{i}^{box} $ denotes the predicted boundary value of the network. $y_{i}^{box} $ denotes the background coordinate value. The equation for Landmark localization is given in Eq. (3).

(3)
$ L_{i}^{bolandmark} =\left\| \hat{y}_{i}^{bolandmark} -y_{i}^{bolandmark} \right\| _{2}^{2} ,\nonumber\\ \quad \hat{y}_{i}^{bolandmark} \in {\mathbb R}^{10} . $

In Eq. (3), $\hat{y}_{i}^{bolandmark} $ denotes the result predicted by the network model. $y_{i}^{bolandmark} $ denotes the Landmark localization coordinates of the background. In order to reduce the redundant coordinates and candidate frames, MTCNN employs a non-maximum suppression algorithm to remove the redundant coordinates and candidate frames. Fig. 1 depicts the structure of the MTCNN model.

Fig. 1. Structure of the MTCNN model.

../../Resources/ieie/IEIESPC.2025.14.4.507/fig1.png

As illustrated in Fig. 1, the MTCNN model uses the softmax function to classify the picture by placing the feature mappings in a convolutional layer corresponding to the number of categories. In the meantime, each feature mapping is averaged using a global average pooling procedure to remove the fully connected layers' ``black-box'' property [16,17]. The equation for calculating the output image size after convolution operation and pooling is given in Eq. (4).

(4)
$ \left\{\begin{aligned} & \left[\frac{W-F+2P}{S} +1\right]\times \left[\frac{H-F+2P}{S} +1\right], \\ & \left[\frac{W-F}{S} +1\right]\times \left[\frac{W-F}{S} +1\right]. \end{aligned}\right. $

In Eq. (4), $W$ denotes the matrix width. $F$ denotes the size of the convolution kernel. $P$ denotes padding. $H$ denotes the matrix height. $S$ denotes the step size. Due to the serious grey scale unevenness and suspected background inside the cell in practical applications, the cell foreground extraction will cause problems such as holes and artifacts. Therefore, the study proposes a noise reduction method based on sDCAE. Meanwhile, in order to avoid the occurrence of overfitting phenomenon, the study introduces the $L2$ paradigm, and the loss function of the $L2$ paradigm is shown in Eq. (5).

(5)
$ \min _{w,b} J(w,b)=\frac{1}{m} \sum _{i=1}^{m}L\left(\hat{y}^{(i)} ,y^{(i)} \right) +\frac{\lambda }{2m} \left\| M\right\| _{2}^{2} . $

In Eq. (5), $M$ denotes the weight matrix. $m$ denotes the dimension of the feature vector. $w$ and $b$ denote the different vectors. $\hat{y}^{(i)} $ and $y^{(i)} $B denote the estimated and target values, respectively. $\lambda $ denotes the regularization parameter. The model of CNS algorithm based on MTCNN is shown in Fig. 2.

Fig. 2. Model of cell nuclear segmentation algorithm based on MTCNN.

../../Resources/ieie/IEIESPC.2025.14.4.507/fig2.png

In Fig. 2, the CNS algorithm based on MTCNN divides the CNS task into four steps, i.e., cell nucleus foreground extraction, foreground noise reduction, intra-foreground distance transformation and cell nucleus edge extraction. The cell nucleus foreground extraction network preliminarily segmented the original image. Then the segmented image was noise reduced, reconstructed and segmented by the cell foreground noise reduction network. Then the original image, the preliminary segmented image and the noise reduction and reconstruction segmentation map are input into the cell foreground distance transformation network to perform the intra-foreground distance change and get the distance transformation map. Finally, the distance transformation map is passed into the cell edge learning network to perform the cell nucleus edge extraction, and the cell nucleus edge segmentation results can be obtained. The total loss function of the network is shown in Eq. (6).

(6)
$ L=\left\{\begin{aligned} & L_{1}, && {stage1}, \\ & L_{1} +L_{2}, && {stage2}, \\ & L_{1} +L_{2} +L_{3}, && {stage3}, \\ & L_{1} +L_{2} +L_{3} +L_{4}, && {stage4}. \end{aligned}\right. $

In Eq. (6), $L$ denotes the total loss function. $L_{1} $, $L_{2} $, $L_{3} $ and $L_{4} $ denote the loss functions of the cytosolic foreground extraction network, cytosolic foreground noise reduction network, cytosolic foreground distance transformation network and cytosolic edge learning network, respectively. $stage1$, $stage2$, $stage3$ and $stage4$ denote the four steps of cytosolic foreground extraction, foreground noise reduction, in-foreground distance transformation and cytosolic edge extraction, respectively.

3.2. Interpretable Diagnostic Algorithms Based on DL

For a machine to be truly capable of assisting a doctor in diagnosis, the output of the machine must be understood by the doctor, i.e., be able to accurately describe the image features in natural language. RNN can achieve a natural understanding of the data, and has a long-range memory and the ability to handle long data changes. The structure of RNN is shown in Fig. 3.

In Fig. 3, RNN adopts a cyclic structure, which allows information to be transferred between neurons within layers. However, in the actual training, because the RNN is through the "overwriting" way of class computing state, resulting in easy to appear gradient explosion or gradient dispersion problem [18,19]. The gradient explosion can be solved by gradient trimming, but it is more difficult to solve the gradient dispersion. Additionally, the LSTM in recurrent neural networks may successfully address the issue of gradient dispersion. Fig. 4 depicts the LSTM structure.

Fig. 3. Structure of RNN.

../../Resources/ieie/IEIESPC.2025.14.4.507/fig3.png

Fig. 4. Structure of LSTM.

../../Resources/ieie/IEIESPC.2025.14.4.507/fig4.png

In Fig. 4, the LSTM calculates the input gate, forgetting gate, ready memory unit, memory unit and output gate. Then performs the combinatorial operation to get the current implicit layer variables to determine how much information is memorized and how much information is forgotten and released [20,21]. By this way, the occurrence of gradient dispersion can be effectively avoided, and the sequential data memory ability of RNN can be better utilized. Equation (7) displays the formulae for the input gate, forgetting gate, and output gate.

(7)
$ \left\{\begin{aligned} & f_{t} =sigmoid\left(b_{f} +U^{(f)} h_{t-1} +W^{(f)} x_{t} \right),\\ & o_{t} =sigmoid\left(U^{(o)} h_{t-1} +W^{(o)} x_{t} +b_{o} \right),\\ & i_{t} =sigmoid\left(U^{(i)} h_{t-1} +W^{(i)} x_{t} +b_{i} \right). \end{aligned}\right. $

In Eq. (7), $i_{t} $, $f_{t} $, and $o_{t} $ denote the input, forgetting and output gates, respectively. $W^{(i)} $, $U^{(i)} $, $W^{(f)} $, $U^{(f)} $, $W^{(o)} $, and $U^{(o)} $ denote the learning parameters of the input, forgetting and output gates, respectively. $x_{t} $ denotes the input at the moment of $t$. and $h_{t-1} $ denotes the implicit layer variable at the moment of $t-1$. The equations for the calculation of the preparatory memory units, memory units and implicit layer variables are shown in Eq. (8).

(8)
$ \left\{\begin{aligned} & h_{t} =o_{t} *\tanh \left(c_{t} \right),\\ & \tilde{c}_{t} =\tanh \left(U^{(c)} h_{t-1} +b_{c} +W^{(c)} x_{t} \right),\\ & c_{t} =i_{t} *\tilde{c}_{t} +f_{t} *c_{t-1}. \end{aligned}\right. $

In Eq. (8), $\tilde{c}_{t} $ and $c_{t} $ denote the preparatory memory unit and the memory unit, respectively. $W^{(c)} $ and $U^{(c)} $ both denote the learning parameters of the preparatory memory unit. If image processing is to be interpretable, it is necessary to describe the picture features through natural language, but there are fewer IA for the medical imaging field, and the diagnostic reports generated often contain a large amount of redundant information. Therefore, the study proposes IA-ATDCRN based on deep convolutional residual network and ATT. The schematic model of ATDCRN algorithm is shown in Fig. 5.

Fig. 5. Schematic diagram of ATDCRN algorithm model.

../../Resources/ieie/IEIESPC.2025.14.4.507/fig5.png

The model is broken down into three sections, namely feature extraction, ATT, and diagnostic prediction, as shown in Fig. 5. The model first performs feature extraction of the original image by deep convolutional residual network, then selects the effective region in the image by ATT, and finally integrates the feature attributes and makes diagnosis by LSTM. First, the deep convolution residual network will extract the lesion attribute properties of cells. Then, ATT will select the effective areas in the image, thereby effectively utilizing the semantic information of different attributes. This process ultimately leads to more accurate diagnosis and prediction. In particular, the attention matrix can be derived by extracting the deep convolution residual network phrase and then weighted averaging the convolution feature tensor, which represents the attention weight region. Normalizing and visualizing the attention matrix on the original input image provides a useful interpretation of the model prediction. The weighted feature calculation equation is shown in Eq. (9).

(9)
$ f_{k} =Att_{k} *attfeats_{k} . $

In Eq. (9), $f_{k} $ denotes the weighted feature of the $k$th ATT module. $Att_{k} $ denotes the convolutional feature basis of the $k$th ATT module. $attfeats_{k} $ denotes the convolutional feature tensor of the corresponding module. Through ATT, the effective use and trade-off of residual convolutional features is achieved, which provides strong support for diagnostic prediction. The calculation equation for the input of the diagnostic prediction module is shown in Eq. (10).

(10)
$ x_{k} =relu\left(W^{(x)} f_{k} +b_{x} \right) . $

In Eq. (10), $x_{k} $ denotes the input of step $k$. $f_{k} $ denotes the selective features of step $k$. $W^{(x)} $ denotes the learning parameters. The equation for the nonlinear mapping of the hidden layer variables to the four-dimensional space is given in Eq. (11).

(11)
$ \left\{\begin{aligned} & z_{k} =relu\left(W^{(z)} h_{k} +b_{z} \right),\\ & s_{k} =W^{(s)} z_{k} +b_{s}. \end{aligned}\right. $

In Eq. (11), $z_{k} $ denotes the hidden layer variables after nonlinear transformation. $s_{k} $ denotes the probability vector. In the training process, the first stage is labeled with the basic attribute semantics, and the loss function at this time is shown in Eq. (12).

(12)
$ \left\{\begin{aligned} & l_{1c} =SCE(predicted,\,conclusion),\\ & l_{1k} =SCE(predicted,\,target_{k}),~k\!\in\! \{1,\,2,\,3,\,4\},\\ & L_{1} (\theta _{R} ,\theta _{A} )=l_{1c} +\sum _{k=1}^{4}l_{1k}. \end{aligned}\right. $

In Eq. (12), $l_{1c} $ denotes the prediction loss function of the residual network. $l_{1k} $ denotes the loss function of the $k$th ATT module. $SCE$ denotes the nonlinear activation cross-entropy loss function of the sigmoid function. $L_{1} $ denotes the loss function of the first stage. $\theta _{R} $ and $\theta _{A} $ denote the learning parameters of the residual network and the ATT module, respectively. The second stage will be trained for diagnostic result prediction, and the loss function for this stage is shown in Eq. (13).

(13)
$ \left\{\begin{aligned} & l_{2k} =SCE(s_{k} ,~conclusion),~k\in \{0,~1,~2,~3,~4\},\\ & L_{1} (\theta _{C} )=\sum _{k=0}^{4}l_{2k}. \end{aligned}\right. $

In Eq. (13), $l_{2k} $ the loss function of the $k$th ATT module. $s_{k} $ denotes the $k$th prediction vector. $\theta _{C} $ denotes the learning parameters of the diagnostic prediction module. $L_{2} $ denotes the loss function of the second section stage. At this point, the total loss function is shown in Eq. (14).

(14)
$ L(\theta _{R} ,\theta _{A} ,\theta _{C} )=\alpha L_{1} +\beta L_{2} . $

In Eq. (14), $\alpha $ and $\beta $ denote the loss weights of the first and second stages, respectively. The computational schematic of ATDCRN is shown in Fig. 6.

Fig. 6. Calculation diagram of ATDCRN.

../../Resources/ieie/IEIESPC.2025.14.4.507/fig6.png

In Fig. 6, the linear transformation matrix passes through the selected band-column vectors. The whole algorithm updates the parameters through the back-propagation algorithm. Moreover, the prediction vectors obtained from the ATT module are computed through the LSTM, which can obtain the final prediction results. The gap in Fig. 6 is a global average pooling, so a given convolutional feature tensor is converted by the gap into smaller feature tensors, and it can be converted into a prediction vector of semantic categories after a simple linear transformation. It is worth noting that there are four ATTs in ATDCRN, because the study constructed the network using cervical precancerous lesions, which have four attribute features. The prediction conclusion calculation equation is shown in Eq. (15).

(15)
$ s_{f} =soft\max (s_{0} +s_{1} +s_{2} +s_{3} +s_{4} ) . $

In Eq. (15), $s_{f} $ denotes the final prediction result. $s_{0} $, $s_{1} $, $s_{2} $, $s_{3} $, and $s_{4} $ denote the prediction vectors obtained from each prediction module, respectively.

4. Experimental Results and Analysis

The study runs separate tests on the MTCNN-based CNS algorithm and the ATDCRN interpretable model in order to validate their individual performance. Among them, the MTCNN-based CNS algorithm will be tested on the public dataset of the CNS competition and compared with the DCAN model and the Jbarker model. The cervical precancer dataset, which contains 1000 images, will be utilized for testing the ATDCRN model, with 200 of those images serving as the testing set. The experiments will also be compared to the performance of the three algorithms, ATDCRN, AlexNet, and ResNet. The experimental parameter settings for MTCNN and ATDCRN are shown in Table 1.

Table 1. Setting of the experimental parameters.

Network

Parameter

Value

MTCNN[22]

Enter the image channel

3

Patch

31*31

Cascade feature graph channel

5

outgoing channel

1

Convolution kernel

3*3

ATDCRN[23]

Convolution kernel

3*3

Batchsize

64

Learning rate

0.001

stride

2

Padding

3

Table 1 indicates that the number of input image channels in MTCNN is 3, the number of output image channels is 1, the patch size is 31$\mathrm{\ast}$31, the size of the convolution kernel is 3$\mathrm{\ast}$3, and the number of channels in the cascade feature map is 5. In contrast, the convolution kernel size of ATDCRN is 3$\mathrm{\ast}$3, the batch size is 64, the learning rate is 0.001, and the stride and padding are 2 and 3, respectively. The F1-score and IoU of MTCNN, DCAN and Jbarker are shown in Fig. 7.

Fig. 7. F1-score and IoU of MTCNN, DCAN, and Jbarker.

../../Resources/ieie/IEIESPC.2025.14.4.507/fig7.png

In Fig. 7(a), the F1-scores of all three models increase with the number of iterations, with DCAN having F1-scores of about 0.74 and 0.84 at 100 and 600 iterations, respectively, and an average F1-score of about 0.8. Jbarker having F1-scores at 100 and 600 iterations are about 0.72 and 0.81, respectively, with an average F1-score of about 0.77. While MTCNN has an F1-score of about 0.78 and 0.87, respectively, with an average F1-score of about 0.84 at 100 and 600 iterations. From Fig. 7(b), the highest and lowest IoUs of DCAN are about 0.84 and 0.8, respectively, and the average IoU is about 0.82. The highest and lowest IoUs of Jbarker are about 0.83 and 0.78, respectively, and the average IoU is about 0.81. The highest and lowest IoUs of MTCNN are about 0.86 and 0.82, respectively, and the average IoU is about 0.85. It can be seen that the segmentation results of MTCNN have higher accuracy and better overall performance. The superior performance of MTCNN relative to other models can be attributed to its incorporation of intermediate learning processes, including cell foreground denoising and distance transformation. The MTCNN model employs a multi-task sequence learning method, which renders it more sensitive to severely overlapping and fuzzy kernel edges. The D1, D2 and scores of the three models are shown in Fig. 8.

Fig. 8. DICE1 and DICE2 of three models.

../../Resources/ieie/IEIESPC.2025.14.4.507/fig8.png

From Fig. 8(a), the highest and lowest DICE1 of DCAN is about 0.87 and 0.83, and the average DICE1 is about 0.85. The highest and lowest DICE1 of Jbarker is about 0.85 and 0.79, respectively, and the average DICE1 is about 0.83. The highest and lowest DICE1 of MTCNN is about 0.91 and 0.85, respectively. Average DICE1 is about 0.88. From Fig. 8(b), the lowest and highest DICE2 of DCAN are about 0.68 and 0.74, respectively, and the average DICE2 is about 0.72. The lowest and highest DICE2 of Jbarker are about 0.64 and 0.72, respectively, and the average DICE2 is about 0.68. The lowest and highest DICE2 of MTCNN are are about 0.72 and 0.79, respectively, with an average DICE2 of about 0.75. It can be concluded that the CNS performance of MTCNN is better. The superiority of DICE1 and DICE2 of MTCNN to other models can be attributed to the introduction of cell foreground extraction and distance transformation in MTCNN. This further indicates that the learning process of cell foreground distance transformation is highly sensitive to severe overlapping cell edges. The AJI and mAp of the three models are shown in Fig. 9.

Fig. 9. AJI and mAp for three different models.

../../Resources/ieie/IEIESPC.2025.14.4.507/fig9.png

In Fig. 9(a), the AJI of DCAN has a minimum of about 0.72 and a maximum of about 0.78, with an average of about 0.75. The AJI of Jbarker has a minimum and a maximum of about 0.68 and 0.75, respectively, with an average AJI of about 0.71. The AJI of MTCNN has a minimum of about 0.76 and a maximum of about 0.85, with an average of about 0.81. From Fig. 9(b), the lowest and highest mAP of DCAN are about 0.72 and 0.79, respectively, and the average mAp is about 0.76. The lowest mAP of Jbarker is about 0.69, the highest is about 0.74, and the average is about 0.72. The lowest mAP of MTCNN is about 0.78, the highest is about 0.82, and the average is about 0.80. Among them, the MTCNN has higher AJI and mAp than the remaining two algorithms. The diagnostic conclusion prediction accuracy and recall of ATDCRN, AlexNet and ResNet are shown in Fig. 10.

Fig. 10. Prediction accuracy and recall rate of diagnostic conclusions for ATDCRN, AlexNet, and ResNet.

../../Resources/ieie/IEIESPC.2025.14.4.507/fig10.png

In Fig. 10(a), the highest diagnostic conclusion prediction accuracy of AlexNet is about 73.1%, the lowest is about 69.7%, and the average accuracy is about 71.9%. The highest accuracy of ResNet is about 80.2%, the lowest is about 76.9%, and the average accuracy is about 78.3%. The highest accuracy of ATDCRN is about 86.6%, and the lowest is 83.8% or so, with an average accuracy of about 85.2%. In Fig. 10(b), the highest diagnostic conclusion prediction recall rate of AlexNet is about 83.3%, the lowest is about 80.3%, and the average recall rate is about 81.9%. The highest recall rate of ResNet is about 87.8%, the lowest is about 85.2%, and the average recall rate is about 86.2%. The highest recall rate of ATDCRN is about 90.3%, the lowest is 88.5% or so, with an average recall rate of about 89.3%. It can be concluded that ATDCRN can effectively improve the accuracy and recall of diagnostic conclusion prediction. The superior performance of ATDCRN compared to other models is attributable to the introduction of four ATTs, each corresponding to a specific attribute feature. These mechanisms are capable of effectively utilizing semantic information pertaining to different attributes. The semantic attribute prediction accuracy and recall of ATDCRN, AlexNet and ResNet are shown in Fig. 11.

Fig. 11. Semantic attribute prediction accuracy and recall rate of ATDCRN, AlexNet, and ResNet.

../../Resources/ieie/IEIESPC.2025.14.4.507/fig11.png

In Fig. 11(a), the highest semantic attribute prediction accuracy of AlexNet is about 74.2%, the lowest is about 68.7%, and the average accuracy is about 72.2%. The highest accuracy of ResNet is about 78.3%, the lowest is about 71.6%, and the average accuracy is about 74.8%. The highest accuracy of ATDCRN is about 78.7%, and the lowest is 73.2% or so, with an average accuracy of about 76.5%. From Fig. 11(b), AlexNet has the highest semantic attribute prediction recall of about 76.1%, the lowest of about 72.3%, and the average recall of about 74.6%. ResNet has the highest recall of about 81.2%, the lowest of about 73.7%, and the average recall of about 77.0%. ATDCRN has the highest recall of about 81.6%, the lowest of about 77.2%, with an average recall rate of about 79.9%. The above results show that the performance of ATDCRN for semantic attribute prediction is excellent compared to the rest of the algorithms. The study conducted extensive experiments to contrast the proposed ATDCRN algorithm with CNN-RNN and Mask-CNN in order to further validate its performance. Consequently, the utilization of ATDCRN for auxiliary medical diagnosis can effectively reduce the misdiagnosis rate to less than 20%, thereby significantly improving the current medical diagnosis problems. The diagnostic conclusions and semantic attribute prediction accuracy of ATDCRN, CNN-RNN and Mask-CNN are shown in Fig. 12.

Fig. 12. Diagnostic conclusion and semantic attribute prediction accuracy of ATDCRN, CNN-RNN, and mask CNN.

../../Resources/ieie/IEIESPC.2025.14.4.507/fig12.png

Fig. 12(a) illustrates that the diagnostic conclusion prediction accuracy of CNN-RNN is approximately 78.3% at the highest and 76.8% at the lowest, with an average accuracy of about 77.5%. Mask-CNN is approximately 81.3% at the highest and 79.8% at the lowest, with an average accuracy of approximately 80.7%. ATDCRN is approximately 87.2%, the lowest is approximately 85.4%, and the average accuracy is approximately 86.1%. As can be seen in Fig. 12(b), the highest semantic attribute prediction accuracy rate of CNN-RNN is about 72.4%, the lowest is about 69.6%, and the average accuracy rate is about 70.9%. The highest semantic attribute accuracy rate of Mask-CNN is about 74.6%, the lowest is about 73.1%, and the average accuracy rate is about 74%. The highest semantic attribute accuracy rate of ATDCRN is about 80.3% and the lowest is about 78.7%, with an average accuracy rate of about 79.5%.

5. Conclusion

Medical image processing, a branch of computer vision, has advanced quickly alongside computer technology and artificial intelligence. The study covers the issues of pathological CNS and pathological image interpretability, and it suggests and tests separately a CNS algorithm based on MTCNN and an IA method based on ATDCRN. According to the experimental findings, the F1-score of MTCNN is approximately 0.78 and 0.87 at 100 and 600 iterations, respectively, with an average F1-score of about 0.84. The IoU is the highest at approximately 0.86, the lowest at approximately 0.82, and the average is at approximately 0.85. All of them are higher than those of the DCAN and Jbarker models. The highest DICE1 values are 0.87, 0.85, and 0.91 for the DCAN, Jbarker, and MTCNN, respectively. The average DICE1 value is 0.85, 0.83, and 0.88. The highest DICE2 values are 0.74, 0.72, and 0.79, with the average being 0.72, 0.68, and 0.75. MTCNN has the highest DICE1 and DICE2 values. The average mAP and AJI of MTCNN are about 0.80 and 0.81 respectively, which are still higher than those of DCAN and Jbarker models. The average diagnostic conclusion prediction accuracies of ATDCRN, AlexNet and ResNet are 85.2%, 71.9% and 78.3% respectively, and the average diagnostic conclusion prediction recall rates are 89.3%, 81.9% and about 85.2%. The average semantic attribute prediction accuracy is about 76.5%, 72.2%, and 74.8%, respectively. The average semantic attribute prediction recall is about 79.9%, 74.6%, and 77.0%, respectively. ATDCRN has the highest prediction accuracy and recall for both diagnostic conclusions and semantic attributes. The aforementioned findings demonstrated that while the IA based on ATDCRN permits accurate interpretation of medical pictures, the CNS model based on MTCNN efficiently enhances CNS accuracy. The study does not account for the presence of a significant number of samples, which raises the risk of the proposed algorithm being less effective and accurate when faced with a lot of data.

REFERENCES

1 
R. Suzuki, N. Yajima, K. Sakurai, N. Oguro, T. Wakita, and D. H. Thom et al., ``Association of patients' past misdiagnosis experiences with trust in their current physician among Japanese adults,'' Journal of General Internal Medicine, vol. 37, no. 5, pp. 1115-1121, 2022.DOI
2 
S. Diao, J. Hou, H. Yu, X. Zhao, and W. Luo, ``Computer-aided pathologic diagnosis of nasopharyngeal carcinoma based on deep learning,'' American Journal of Pathology, vol. 190, no. 8, pp. 1691-1700, 2020.DOI
3 
F. Masood, J. Masood, H. Zahir, K. Driss, N. Mehmood, and H. Farooq, ``Novel approach to evaluate classification algorithms and feature selection filter algorithms using medical data,'' Journal of Computational and Cognitive Engineering, vol. 2, no. 1, pp. 57-67, 2023.DOI
4 
A. Al-Saffar, A. Zamani, A. Stancombe, and A. Abbosh, ``Operational learning-based boundary estimation in electromagnetic medical imaging,'' IEEE Transactions on Antennas and Propagation, vol. 70, no. 3, pp. 2234-2245, 2022.DOI
5 
L. Alzubaidi, M. A. Fadhel, O. Al-Shamma, J. Zhang, J. Santamaria, and Y. Duan, ``Robust application of new deep learning tools: An experimental study in medical imaging,'' Multimedia Tools and Applications, vol. 81, no. 10, pp. 113289-113317, 2022.DOI
6 
T. Zeng, D. Kong, J. Zhang, and Q. Ma, ``Weighted area constraints-based breast lesion segmentation in ultrasound image analysis,'' Inverse Problems and Imaging, vol. 16, no. 2, pp. 451-466, 2022.DOI
7 
W. Song, A. H. Kaakour, A. Kalur, J. C. Muste, A. I. Iyer, and C. C. S. Valentim et al., ``Performance of a machine-learning computational image analysis algorithm in retinal fluid quantification for patients with diabetic macular edema and retinal vein occlusions,'' Ophthalmic Surgery, Lasers & Imaging Retina, vol. 53, no. 3, pp. 123-131, 2022.DOI
8 
F. Xie, K. Zhang, F. Li, G. Ma, Y. Ni, and W. Zhang et al., ``Diagnostic accuracy of convolutional neural network-based endoscopic image analysis in diagnosing gastric cancer and predicting its invasion depth: a systematic review and meta-analysis,'' Gastrointestinal Endoscopy, vol. 95, no. 4, pp. 599-609, 2022.DOI
9 
C. D. Ruberto, A. Loddo, and G. Puglisi, ``Blob detection and deep learning for leukemic blood image analysis,'' Applied Sciences, vol. 10, no. 3, pp. 1176-1188, 2020.DOI
10 
M. Chen, X. Shi, Y. Zhang, D. Wu, and M. Guizani, ``Deep feature learning for medical image analysis with convolutional autoencoder neural network,'' IEEE Transactions on Big Data, vol. 7, no. 4, pp. 750-758, 2021.DOI
11 
V. R. Kota and S. D. Munisamy, ``High accuracy offering attention mechanisms based deep learning approach using CNN/bi-LSTM for sentiment analysis,'' International Journal of Intelligent Computing and Cybernetics, vol. 15, no. 1, pp. 61-74, 2022.DOI
12 
Z. Jiang, W. He, K. M. Stephen, S. A. Man, S. Wang, and V. Stanislawski et al., ``Weakly supervised spatial deep learning for earth image segmentation based on imperfect polyline labels,'' ACM Transactions on Intelligent Systems and Technology (TIST), vol. 13, no. 2, pp. 169-188, 2022.DOI
13 
I. J. Ding and N. W. Zheng, ``RGB-D depth-sensor-based hand gesture recognition using deep learning of depth images with shadow effect removal for smart gesture communication,'' Sensors and Materials, vol. 34, no. 1, pp. 203-216, 2022.DOI
14 
X. Zhang, Y. Gong, C. Qiao, and W. Jing, ``Multiview deep learning based on tensor decomposition and its application in fault detection of overhead contact systems,'' The Visual Computer, vol. 38, no. 4, pp. 1457-1467, 2022.DOI
15 
B. B. Nair, S. Krishnamoorthy, M. Geetha, and S. N. Rao, ``Machine vision based flood monitoring system using deep learning techniques and fuzzy logic on crowdsourced image data,'' Intelligent Decision Technologies, vol. 15, no. 3, pp. 357-370, 2021.DOI
16 
X. B. Yang and W. Zhang, ``Heterogeneous face detection based on multi-task cascaded convolutional neural network,'' IET Image Processing, vol. 16, no. 1, pp. 207-215, 2022.DOI
17 
X. Wu, P. Li, J. Zhou, and Y. Liu, ``A cascaded CNN-based method for monocular vision robotic grasping,'' Industrial Robot, vol. 49, no. 4, pp. 645-675, 2022.DOI
18 
Y. Yang and X. Song, ``Research on face intelligent perception technology integrating deep learning under different illumination intensities,'' Journal of Computational and Cognitive Engineering, vol. 1, no. 1, pp. 32-36, 2022.DOI
19 
B. Xing, E. Xu, J. Wei, and Y. Meng, ``Recurrent neural network non-singular terminal sliding mode control for path following of autonomous ground vehicles with parametric uncertainties,'' IET Intelligent Transport Systems, vol. 16, no. 5, pp. 616-629, 2022.DOI
20 
Y. Zhang, S. Wang, G. Sun, and J. Mao, ``Aerodynamic surrogate model based on deep long short-term memory network: An application on high-lift device control,'' Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, vol. 236, no. 6, pp. 1081-1097, 2022.DOI
21 
J. Yao, B. Li, and J. Zhao, ``Tool remaining useful life prediction using deep transfer reinforcement learning based on long short-term memory networks,'' The International Journal of Advanced Manufacturing Technology, vol. 118, no. 3, pp. 1077-1080, 2022.DOI
22 
J. Liu, ``Research on video image face detection and recognition technology based on improved MTCNN algorithm,'' International Journal of Wireless and Mobile Computing, vol. 22, no. 3, pp. 205-212, 2022.DOI
23 
V. K. Vatsavayi and N. Andavarapu, ``Identification and classification of wild animals from video sequences using hybrid deep residual convolutional neural network,'' Multimedia Tools and Applications, vol. 81, no. 23, pp. 33335-33360, 2022.DOI

Author

Junye Yang
../../Resources/ieie/IEIESPC.2025.14.4.507/au1.png

Junye Yang graduated from Shenyang University of Chemical Technology in March 2010 with her master's degree in computer software and theory. She is currently working in Shijiazhuang Institute of Technology. Her main research direction includes computer technology, information security and artificial intelligence. She is the chief editor of 2 textbooks. She has published more than 20 academic articles, including 2 Chinese core articles, 1 SCI articles, and participated in 7 scientific research projects.

Yujuan Du
../../Resources/ieie/IEIESPC.2025.14.4.507/au2.png

Yujuan Du obtained her master's degree in oncology from Hebei Medical University in 2012. She is working in the Department of Geriatrics at Chuiyangliu Hospital affiliated to Tsinghua University. Her areas of interest include geriatric oncology, geriatric medicine, cancer palliative and hospice care.

Fang Liu
../../Resources/ieie/IEIESPC.2025.14.4.507/au3.png

Fang Liu graduated from Shanxi University in July 2010 with her master's degree in computer application technology. She is currently working in Shijiazhuang Institute of Technology. She is acting as the director of the Internet Application Technology College. Her main research direction is software technology development and artificial intelligence. She has published more than 20 academic articles, including 2 Chinese core articles, 3 SCI articles, presided over and participated in 4 scientific research projects at the provincial level, she is in chief of 4 textbooks.