Mobile QR Code QR CODE

2024

Acceptance Ratio

21%


  1. (Students' Affairs Office, Jizhong Vocational College, Dingzhou 073000, China)
  2. (Department of Education, Jizhong Vocational College, Dingzhou 073000, China)
  3. (Department of Information Engineering, Jizhong Vocational College, Dingzhou 073000, China)



Dual path network, Stroke information, Gray level co-occurrence matrix, Support vector machine, Art images

1. Introduction

Art images, as the crystallization of human intelligent civilization, play an important role and significance. Through the content of art works and literature records from different periods, scientists can infer the development trends and characteristics of a certain era, thereby helping people better conduct historical excavation and archaeology. Because there are different characteristics in the development stages of human history, reflected in the creation of art images, artists will be divided into different schools and present different styles in their works. How to effectively classify art images from different periods and schools has also been an urgent issue to be addressed. In the relevant research, machine learning methods have been widely utilized because they can simulate and implement human learning behavior using computational methods [1], and complete corresponding tasks by acquiring knowledge and skills, adjusting and restructuring their own knowledge structure. It completes the recognition and classification of natural images by extracting image features and using induction and synthesis methods. Machine learning methods currently have good performance in image classification. However, the natural image classification process it participates in often overlooks the feature extraction of the author's stroke information, and the recognition and classification of images with similar or similar styles are not as effective as artificial effects [2]. In view of this, a two-way dual path network model is proposed for the classification of art images of various genres and styles. Extracting color, graphics, strokes, and other information from art images through dual channels, and transferring the extracted information to the dual path model for processing in both directions, for reducing the processing time of feature information. Finally, it uses support vector machines to process the incoming feature data and complete the image classification work. The research is separated into four. The first discusses the relevant research work in recent years; The second is the construction of a dual channel dual path model for image classification; The third is performance testing of the constructed model; The fourth summarizes the model experimental data and draws conclusions.

2. Related Works

In terms of image classification work, many researchers have made relevant reference studies. Wang et al. presented a new hierarchical architecture in view of feature and label information to solve the problem of ignoring embedded layered label information in multi label images [3]. Obeso et al. presented a method of adding supplementary visual data next for training images to improve the adaptive feature capture function of deep neural networks. This is to help the model have more precise adaptability in image recognition (IR) and classification [4]. Kumar et al. proposed a new low-level hyperspectral image classification framework to solve the problem of difficult analysis of parameter information in the dataset generated by hyperspectral sensors. This framework improves the recognition ability of hyperspectral image classification frameworks by considering the sensitivity of the contextual spectral space of hyperspectral images, and verifies the performance of the proposed framework in subsequent experiments [5]. Minh et al. proposed a network robust model for X-ray in view of the classification dataset of implant X-ray images. This model combines the squeezing and excitation blocks in the residual network, weights image features, and improves the model's performance in image classification. The final experimental results indicate that the network model proposed by the research institute has a certain effect [6]. To better classify sea ice disasters, Han et al. presented a multi-level feature fusion image classification method in view of residual networks. The first component in the image was extracted through principal component analysis, and the residual network was combined for deepening the network layers, fuse features in various layers, and improving the sea ice classification [7]. Liu et al. proposed a hierarchical learning algorithm in view of a backtracking Bayesian neural network classifier for supporting large-scale image classification work. This algorithm constructs a hierarchical structure for different categories in the image dataset by establishing a visual confusion number label tree, and verifies the proposed algorithm model through experiments [8]. Jiang et al. proposed a lung nodule classification framework that focuses on and integrates 3D dual path networks to solve the problems in the automatic lung nodule classification process. This framework models the correlation between adjacent positions through attention mechanism, and verifies the effectiveness of its algorithm model through experiments [9]. Sheng et al. proposed a special 3D multi-scale network model in view of a dual path architecture for overcoming the limitations of segmentation and manual feature extraction required for traditional lung nodule image classification. Through comparative experiments, the effectiveness of the proposed model was verified [10].

Ji et al. proposed a hybrid heterogeneous structure model to address the issue of ignoring the topological structure information of feature space in image classification models constructed in recent years. It improves the classification accuracy of the model by capturing geometric structure information around visual words and transforming visual features within the main vector unit [11]. Huang et al. presented a spatial pyramid model for aerial image classification for addressing the issue of traditional image classification methods not being able to explicitly utilize the spatial information of aerial images. It completes different types of aerial IR by combining regularization probability, and tests the performance of the model in experiments [12]. Yang et al. presented a multi task cascade deep learning (DL) model for solving the problem of small data sets and time-consuming annotation access of Traditional Chinese medicine images for automatic diagnosis of thyroid nodules. It helps the dual path semi supervised structure generate higher quality images by quantifying the ultrasound features of nodules, and the effectiveness of its model has been verified through experiments on ultrasound images [13]. Chunyu et al. proposed a dual path Convolutional neural network in view of attention mechanism for solving the problem of poor ability of Convolutional neural network to describe and generalize the edge of Chung-guyok in hyperspectral image classification, and proved the effectiveness of its model in subsequent experiments [14]. For solving the problem that visual content is difficult to directly match text description, Ma et al. proposed a dual path model integrating the largest gating block to extract the pattern features of different words embedding and make visual text association more significant [15]. To improve the application of digital full glass images in automatic analysis of cervical smears, Lin et al. proposed a dual path efficient depth Convolutional neural network model for lesion retrieval, and verified the reasoning efficiency and sensitivity of small and large lesions in experiments [16]. For solving the limitation of high-dimensional feature training size and the over fitting problem of spectral space, Jianing et al. presented a dual channel spectral space fusion capsule generation antagonism network model for hyperspectral image classification. It utilizes spectral space to synthesize and switch information from different channels for improving the entire classification model performance [17]. To realize the classification of high-precision hyperspectral images with limited training samples, Deng et al. proposed a lightweight dual channel Convolutional neural network. This network utilizes improved residual blocks instead of residual connections to further enhance network performance, and the performance of its model is verified through experiments [18].

In summary, dual channel and dual path network models have mature performance and research in IR and classification. However, there is limited research on the classification of art images using dual channel and dual path network models. In view of this, a dual path network model integrating three primary colors and stroke information channels is proposed to complete the recognition and classification of art images in different styles and genres.

3. Construction of a Recognition and Classification Model for Art Images Based on the Fusion of Dual Channel and Dual Path Networks

The dual channels selected for this study are the three primary color information channel and the stroke information channel. This channel utilizes a grayscale co-occurrence matrix to process the feature information extracted from the stroke channel, and inputs the feature information extracted from the two channels into a dual path network for processing; Then it combines support vector machines to classify feature data and complete the task of classifying and recognizing art images in different styles and genres.

3.1. Construction of a Dual Path Network Image Classification Model

The image classification technology for art drawing mainly operates by determining the style and genre of the tested object, as well as the author's characteristics. Common natural image classification technologies are mainly in view of DL networks. This introduces the Dual Path Network (DPN) model, which, as a simple and efficient model for image classification, can be considered as an integrated network model consisting of DenseNet (DN) and ResNet (RN) through the HORNN framework. The mapping method of RN is the difference between the actual observation values and the fitted values, which can effectively transfer the learning objectives and ensure the accuracy of the model. Fig. 1 illustrates the relevant schematic diagram.

Fig. 1. Schematic diagram of residual network structure.

../../Resources/ieie/IEIESPC.2025.14.4.495/fig1.png

Fig. 1 shows that in the RN model, the upper layer of the network will propagate the original input information x forward together. This also means that no matter how poor the performance of f (x) is, it is unlikely to affect the performance of the RN model. DN network is a simple connection mode architecture that improves the short path model, with fewer parameters and narrower network, with less than 100 convolutional layer outputs in dense blocks. Therefore, the number of mapped features will also be small, making it relatively easy to train the network. The DN network model is shown in Fig. 2.

Fig. 2. Schematic diagram of DN network structure.

../../Resources/ieie/IEIESPC.2025.14.4.495/fig2.png

Fig. 2 shows that the DN network model can directly connect all layers with matching features and preserve information between layers in the order of mapping size. The nth layer obtains additional information from the n-1 layer, and maps all the nth layer information obtained to the n+1 layer. By gradually mapping and transmitting information, it ensures that the feature mapping size in each structural block is consistent. This is to avoid the problem of missing relevant information due to varying feature map sizes during the information stitching process. The internal connection route of the DPN model's neural network is a novel topology structure that mimics the dense DN and RN, and the network structure of DPN is relatively similar to RN; The order of its network structure is 7 * 7 convolutional layer, maximum pooling layer, and four stages; Each stage includes UB Stage, which is then connected to the global average pooling layer and fully connected layer, and finally to the Softmax layer. It can be considered that DPN is a network model formed by integrating DN and RS. Fig. 3 demonstrates the details.

Fig. 3. DPN Schematic diagram of network structure.

../../Resources/ieie/IEIESPC.2025.14.4.495/fig3.png

Fig. 3 shows that the DPN model retains most of the structure of the RN model and integrates some of the DN model, which can generate new features while ensuring low redundancy of features. The formula involved in using RN networks to reduce complex data in the DPN model is shown in Eq. (1).

(1)
$ x^{k} \triangleq \sum _{t=1}^{k-1}f_{t}^{k} (h^{t} ). $

In formula (1), $x^{k} $represents the information extracted in step $k$of a single path; $f(\cdot)$ is the dual path function; $f(\cdot)$ is the initial path; $h$ is the number of features. It utilizes the DN network to generate new feature related formulas as shown in (2).

(2)
$ y^{k} \triangleq \sum _{t=1}^{k-1}v_{t} (h^{t} ) . $

In formula (2), $v_{t} $is the feature learning function of the dual path function; $v_{t} $ represents the generated new feature information. The entire function expression refers to the remaining paths that support the reuse of common features. Integrate RN and DN to obtain the functional expression of DPN, as shown in formula (3).

(3)
$ r^{k} \triangleq x^{k} +y^{k} . $

The current feature state of a single path is generated through the transformation function of the DPN network, with the next mapping or prediction. The relevant function expression is shown in formula (4).

(4)
$ h^{k} =g^{k} (r^{k} ). $

In formula (4), $h^{k} $represents the current state; $g(\cdot)$is the conversion function.

3.2. Construction of a Dual Channel Dual Path Network Art Image Classification Model

Dual channels can enable the simultaneous dissemination of dual information and facilitate cross-border flow of information, achieving two-way communication between different locations; The dual path model can explain how information affects event processes and outcomes from the dual path perspectives of the central and edge paths, providing a very detailed and comprehensive framework. This study combines the advantages and performance of both and proposes a fine art painting via two channel dual path networks (FPTD) network model for the recognition and classification of art images. The structure of the FPTD model constructed by the research institute is shown in Fig. 4.

Fig. 4. Structure diagram of RPTD network model.

../../Resources/ieie/IEIESPC.2025.14.4.495/fig4.png

Fig. 4 showcases that the dual channel of the FPTD model consists of two channels: the three primary color channel (Red, Green, Blue, RGB) and the stroke information channel. The RGB channel mainly extracts visual information of the image, including elements such as graphics, lines, and colors, while the stroke information mainly extracts information about the author's strength, speed, rhythm, and emotions when using the pen. After extracting image features and author stroke related information through dual channels, the FPRT model passes the channel information separately to the path network for processing, and extracts image features through the DPN model. Then it combines its features and uses support vector machine (SVM) for distinguishing the extracted data information, achieving effective classification of feature images. In the FPTD model constructed by the research institute, all original RGB information of art images will be input into the DPN model for calculation; The information extracted by the stroke channel will include the RGB channel information of the grayscale co-occurrence matrix. By integrating the information of the two parts, they are jointly input into the DPN model corresponding to the stroke channel for feature data extraction. The mathematical relationships involved in processing stroke channel information using the grayscale co-occurrence matrix (GLCM) include: let $G$ be a matrix containing grayscale pixels $i$ and $j$; $g$ is the quantity of times pixels $i$ and $j$appear at the specified position $I$; $L$ is the quantity of grayscale levels in the image $I$ with size $M*N$. The functional expression of $G$ is formula (5).

(5)
$ G=(g_{ij} )_{L*L} . $

The expression for the degree function of pixels $i$ and $j$ appearing simultaneously in $I$ is Eq. (6).

(6)
$ g_{ij} =|\{ (x,y)|I(x,y)=i,I(x,y+1)=j\} | . $

After completion, in order to further improve the effectiveness and robustness of information extraction, the study introduces support vector machine (SVM). SVM, as a classical machine learning algorithm, has the advantages of handling high-dimensional data and avoiding overfitting. Specifically, SVM is able to find the best classification hyperplane in high-dimensional space, which improves the model's classification performance and decision-making accuracy in complex traffic environments. By combining SVM and DQN, the model not only learns more accurate state-action value functions, but also improves the discriminative ability of the data and the generalization performance of the model in the feature selection process. The relevant schematic diagram is showcased in Fig. 5.

Fig. 5. Schematic diagram of SVM mathematical algorithm.

../../Resources/ieie/IEIESPC.2025.14.4.495/fig5.png

Fig. 5 shows that the core of SVM algorithm is to create curves or Hyperplane with the strongest generalization and robustness to classify data sets. Linear and nonlinear data sets can be fully classified in two-dimensional space. The classified data sets are generally distributed next to the optimal classification curve or optimal Hyperplane, and the vector data closest to the optimal Hyperplane becomes the support vector. The Hyperplane function expression of two-dimensional linear data is expressed by formula (7).

(7)
$ w*x+b=0 . $

In formula (7), $w$represents the sum of normal vectors that determine the direction of the Hyperplane; $b$is the distance from the Hyperplane to the origin; $x$is the sample data. The distance between Hyperplane of sample data bands can be expressed by formula (8).

(8)
$ d=\frac{|w*x+b|}{\|w\|}. $

The linear and nonlinear problems are transformed into the minimum problem with constrained values for optimization, and the function expression of Quadratic programming is formula (9).

(9)
$ \left\{\begin{aligned} & \min \frac{1}{2} \|w\|^{2}, \\ &\text{s.t.}~y{}_{i} [(wx_{i} )+b]\ge 1. \end{aligned}\right. $

In formula (9), $i=\{ 1$, $2$, ..., $l\} $, and $l$are sample functions; The value of $y{}_{i} $ minimum constraint function is introduced into Lagrange multiplier $a_{i} $. The decision function expression of SVM is shown in formula (10).

(10)
$ f(x)=sgn[\sum _{i=1}^{l}a_{i} y{}_{i} (x_{i} *x)+b] . $

The introduction of KF enables the mapping of data from input space to feature space, facilitating linear separation of data in higher dimensional spaces. The basic form of SVM using KF is shown in formula (11).

(11)
$ f(x)=sgn[\sum _{i=1}^{l}a_{i} y{}_{i} K(x_{i} *x)+b] . $

In formula (11), $K(x_{i} *x)$ is the KF, and commonly used KF include linear KF, polynomial KF, and Gaussian KF. Since the Gaussian KF can map the limited data into a higher dimensional space with less computation, and is suitable for research background requirements, it selects the Gauss sum KF as the KF of SVM to carry out experiments. The expression of Gauss and functions is shown in formula (12).

(12)
$ K(x_{i} *x)=\exp \left(\frac{\|x-x_{i} \|^{2} }{2\sigma ^{2} } \right) . $

In formula (12), $\sigma $is the parameter that determines the local range.

4. Performance Detection of Image Classification Model With Dual Channels and Dual Paths

In order to verify the performance effect of the proposed novel image classification model, the study firstly builds a suitable experimental environment. Secondly, the classification error rate, accuracy, time complexity, space complexity and other indexes of the model were verified. Then, the actual image classification performance effect of the model was comparatively tested in a real data environment.

4.1. Model Performance Testing

It selects the commonly used WikiArt dataset in art drawing as the experimental object for verifying the performance of the dual path network model from the research institute. The WikiArt dataset includes three sections: style, genre, and art author. The style section has 25 styles, while the genre section covers 10 genres with a total of 19 artists. Due to the different types of image sizes in the dataset, the study adjusted all image sizes in the WikiArt dataset to 256 * 256, using 60% of the images in each section for training the model, and the remaining 40% for testing the FPTD model. The specific experimental environment used by the research institute on the Ubuntu operating system platform is shown in Table 1.

Table 1. Research specific experimental environment.

Experimental environment

Experimental parameters

Operating system

Linux

CPU model

Xeon(R) E5-2690 v4

GPU

NVIDIA Tesla K80*4

CPU memory

11GB

The design experiment verifies the information extraction performance of the dual path model, and compares the Top-1 and Top-5 error rates (ER) of RN and CN models for the same style, genre, and art author's art images. The relevant results are shown in Fig. 6.

Fig. 6. The recognition effect of three models on art images of different styles and genres.

../../Resources/ieie/IEIESPC.2025.14.4.495/fig6.png

Fig. 6 shows that in terms of the Top-1 indicator, as the network depth increases, the feature recognition accuracy of the three models has basically improved. In terms of style recognition for different images, the RN model has an improvement accuracy of 2.75%, while the CN model's recognition accuracy decreases as the network depth increases; At layer 131, the ER of the Top-1 indicator is 52.20%, which is 0.09% lower than the initial recognition accuracy. The recognition performance of the DPN model increases with the increase of network depth, decreasing from the initial Top-1 ER of 47.83% to 44.01%, and improving the accuracy by 3.82%. Similarly, in terms of the ER of the Top-5 indicator, the feature recognition accuracy of the DPN model has been improved by 10.42%, and the classification results of the DPN model are the best in both Top-1 and Top-5 indicators. In the art IR of different genre categories in Fig. 6(b), the Top 1 ER of the DPN model decreased from 25.32% to 22.03%, and the Top 5 ER decreased from 1.82% to 1.2%; The classification accuracy of the model is higher than that of DN and CN. In terms of art IR created by different artists, the DPN model performs better than DN and CN, with a Top-1 ER of 11.33% and a Top-5 ER of 0.73% for the 131 layer DPN model. The design experiment tests the feature extraction speed of the three models for the same art image. The specific test item is the comparison of the Time complexity and Space complexity of CN, RN and DPN models in a single channel. The test is carried out on the Mininet virtual network simulation platform. The relevant outcomes are shown in Fig. 7.

Fig. 7. Comparison of time and space complexity of three models in a single channel.

../../Resources/ieie/IEIESPC.2025.14.4.495/fig7.png

Fig. 8. Comparison of GLCM and Gram matrix performance.

../../Resources/ieie/IEIESPC.2025.14.4.495/fig8.png

Fig. 7 shows that as the network depth increases, the sizes of RN, CN, and DPN models all increase, and the processing time for a single CPU for style, genre, and artist gradually increases; The average depth time for DPN model to classify and process art images of different styles, genres, and artists is 13.5 hours, 12 hours, and 4.5 hours, respectively. When the quantity of network layers is 14 and 50, the model size of DPN is the smallest, indicating that the Space complexity of DPN is the lowest at this time, which may be caused by shared convolution parameters. Due to the fact that the information extracted from the stroke information channel contains RGB channel information of the grayscale co-occurrence matrix, it is only necessary to verify the information extracted from the stroke channel to obtain the information jointly extracted by the two channels. Its experimental design tests the channel information extracted by GLCM in four directions of 0?, 45?, 90?, and 135?, as well as GLCM in indistinguishable directions, and compares it with the channel information extracted by the commonly used Gram matrix with good classification performance in the same experiment circumstance. The relevant outcomes are showcased in Fig. 8.

Fig. 8 shows that GLCM performs better than Gram in both Top-1 and Top-5 ER. In the RN model with a network depth of layer 50, the Top 1 ER of the Gram matrix for art image classification in the three modules of style, genre, and artist are 76.39%, 68.15%, and 58.38%, respectively. The relevant outcomes of GLCM on the same indicators were 74.22%, 65.17%, and 57.61%, respectively. After distinguishing the four directions, the accuracy of GLCM increased by 2.96%, 1.34%, and 9.59%, respectively. Fig. 8(b) shows that the two matrices exhibit similar differences in performance across different neural network models, but the overall classification results are still better in the DPN model. The average Top-1 ER of the Gram matrix in the DPN model is more than 0.56% higher than that in the RN model, and the Top-1 ER of GLCM is more than 0.34% higher than that in the RN model.

4.2. Model Simulation Testing

After completing the performance testing of the dual channel and dual path models, experiments were designed to verify the IR and classification performance of the FPTD model. A total of 69351 art images were tested on the WikiArt dataset, and the training and testing sets were separated in a 7:3 proportion. The accuracy results are shown in Fig. 9.

Fig. 9. Comparison of accuracy between FDTP model training set and test set.

../../Resources/ieie/IEIESPC.2025.14.4.495/fig9.png

Fig. 9 shows that as the quantity of iterations grows, the accuracy of the FPTD model roughly showcases a trend of first growing and then diminishing. At 1100 iterations, the accuracy of the model reaches its maximum, with the accuracy of 0.8740 and 0.8742 for the test and training sets, respectively. When the quantity of iterations is 2150, the relevant accuracy no longer changes. At this time, the accuracy of the test set and the training set are 0.8735 and 0.8736, respectively. Due to the training of the FPTD model in the training set, the accuracy of the test set will be higher than that of the training set, up to 0.03% higher. The design experiment verifies the classification of art images in the same category using the FPTD model. 300 images from each of the 8 style sections in the WikiArtist dataset were randomly selected as experimental objects to detect the recognition results of the model on images in the style subset. The relevant results are shown in Fig. 10.

Fig. 10. FPTD model for recognizing different styles of art images.

../../Resources/ieie/IEIESPC.2025.14.4.495/fig10.png

Fig. 10 shows that the FPTD model is good at recognizing different air duct paintings. For Empiricism style art images, the recognition and classification accuracy of the FPTD model is the highest, 91.0%. The recognition and classification accuracy of other styles is between 81.0% and 85.7%. The FPTD model performs relatively poorly in Art Nouveau style art IR and classification. On the same experimental environment and Gallerix dataset, the performance gap between the FPTD model and the existing better performing classification models Alex Net, VGG, Multi drop CNN, GoogLeNet, and RnsNet50 was compared. The relevant outcomes are shown in Fig. 11.

Fig. 11. Comparison of classification results between different models.

../../Resources/ieie/IEIESPC.2025.14.4.495/fig11.png

Fig. 11 shows that in terms of art IR and classification, the six image classification models compared by the research institute have an accuracy rate of over 85%, among which the AlexNet model has the lowest classification accuracy, with an average classification accuracy of around 86.84%; The image classification accuracy of the VGG and Multi drop CNN models is around 87%, while the image classification accuracy of the GoogLeNet model is around 88%; The RnsNet50 and FPTD models have the best classification performance for art images, with an average classification accuracy of over 90%. Among them, the FPTD model possesses the highest accuracy, at 91.56%, which is 1.25% exceeds the RnsNet50 art image classification accuracy. The relevant results show that the FPTD model proposed by the research institute has good recognition and classification performance in art images from different datasets.

5. Conclusion

To further enhance the accuracy of commonly used IR models in view of DL, this study proposes an FPTD model for the recognition and classification of art images in different styles and genres. The experiment indicates that the DPN model has a Top-1 ER of 11.33% and a Top-5 ER of 0.73% in the recognition of different style images in the WikiArtist dataset. The required time to handle paintings of different styles, genres, and artists is 18h, 15h, and 6h, respectively. The Top-1 ER of the four directional stroke channel information processed by GLCM are 70.92%, 61.75%, and 42.13%, respectively. The trained FPTD model has good performance with an accuracy of 87.42% when the quantity of iterations is 1100. In the test, the FPTD model had the best accuracy in image classification in the Artist section, at 91.0%, while the classification accuracy in other style categories ranged from 81.0% to 83.7%. It selects Gallerix's art images as the implementation object, and compares the performance of current models with better classification performance. The recognition and classification accuracy of the FPTD model is 91.56%, which is more than 1.25% higher than that of conventional IR models. The experiment indicates that the FPTD model constructed by the research institute has good performance and relatively stable accuracy in art IR. However, due to the fact that the data used by the research institute is mainly Western oil painting works, other types of works (such as Chinese freehand brushwork) are not considered; Therefore, the general performance of the model may be lacking, and there is some room for improvement.

REFERENCES

1 
Y. Yang and X. Song, ``Research on face intelligent perception technology integrating deep learning under different illumination intensities,'' Journal of Computational and Cognitive Engineering, vol. 1, no. 1, pp. 32-36, 2022.DOI
2 
S. Choudhuri, S. Adeniye, and A. Sen, ``Distribution alignment using complement entropy objective and adaptive consensus-based label refinement for partial domain adaptation,'' Artificial Intelligence and Applications, vol. 1, no. 1, pp. 43-51, 2023.DOI
3 
B. Wang, X. Hu, C. Zhang, P. Li, and P. S. Yu, ``Hierarchical GAN-Tree and Bi-Directional Capsules for multi-label image classification,'' Knowledge-Based Systems, vol. 238, 107882, 2022.DOI
4 
A. M. Obeso, J. Benois-Pineau, M. S. G. Vazquez, and A. A. R. Acosta, ``Visual vs internal attention mechanisms in deep neural networks for image classification and object detection,'' Pattern Recognition: The Journal of the Pattern Recognition Society, vol. 122, pp. 123-138, 2022.DOI
5 
D. K. Pathak, S. K. Kalita, and D. K. Bhattacharya, ``Spectral spatial joint feature based convolution neural network for hyperspectral image classification,'' Concurrency and Computation: Practice and Experience, vol. 34, no. 3, 6547, 2022.DOI
6 
M. T. Vo, A. H. Vo, and T. Le, ``A robust framework for shoulder implant X-ray image classification,'' Data Technologies and Applications, vol. 56, no. 3, pp. 447-460, 2022.DOI
7 
Y. Han, P. Cui, Y. Zhang, R. Zhou, S. Yang, and J. Wang, ``Remote sensing sea ice image classification based on multilevel feature fusion and residual network,'' Mathematical Problems in Engineering: Theory, Methods and Applications, vol. 2021, no. 40, 9928351, 2021.DOI
8 
Y. Liu, Y. Dou, R. Jin, R. Li, and P. Qiao, ``Hierarchical learning with backtracking algorithm based on the visual confusion label tree for large-scale image classification,'' The Visual Computer, vol. 38, no. 3, pp. 897-917, 2022.DOI
9 
H. Jiang, F. Gao, X. Xu, F. Huang, and S. Zhu, ``Attentive and ensemble 3D dual path networks for pulmonary nodules classification,'' Neurocomputing, vol. 398, pp. 422-430, 2020.DOI
10 
S. Wang, X. Kuang, Y. Zhu, W. Zhang, and H. Zhang, ``Deep 3D multi-scale dual path network for automatic lung nodule classification,'' International Journal of Biomedical Engineering and Technology, vol. 39, no. 2, pp. 149-169, 2022.DOI
11 
Z. Ji, Y. Yang, F. Wang, L. Xu, and X. Hu, ``Feature encoding with hybrid heterogeneous structure model for image classification,'' IET Image Processing, vol. 14, no. 10, pp. 2166-2174, 2020.DOI
12 
W. Yang, Y. Dong, Q. Du, Y. Qiang, K. Wu, J. Zhao, X. Yang, and M. B. Zia, ``Integrate domain knowledge in training multi-task cascade deep learning model for benign-malignant thyroid nodule classification on ultrasound images,'' Engineering Applications of Artificial Intelligence, vol. 100, pp. 98-113, 2021.DOI
13 
C. Pu, H. Huang, and L. Luo, ``Classification of hyperspectral image with attention mechanism-based dual-path convolutional network,'' IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 191-205, 2022.DOI
14 
T. Ma, M. Yang, H. Rong, Y. Qian, Y. Tian, and N. Al-Nabhan, ``Dual-path CNN with max gated block for text-based person re-identification,'' Image and Vision Computing, vol. 111, 104168, 2021.DOI
15 
H. Lin, H. Chen, X. Wang, Q. Wang, L. Wang, and P.-A. Heng, ``Dual-path network with synergistic grouping loss and evidence driven risk stratification for whole slide cervical image analysis,'' Medical Image Analysis, vol. 69, pp. 54-62, 2021.DOI
16 
B. Kolisnik, I. Hogan, and F. Zulkernine, ``Condition-CNN: A hierarchical multi-label fashion image classification model,'' Expert Systems with Applications, vol. 182, 115195, 2021.DOI
17 
J. Wang, S. Guo, R. Huang, L. Li, X. Zhang, and L. Jiao, ``Dual-channel capsule generation adversarial network for hyperspectral image classification,'' IEEE Transactions on Geoscience and Remote Sensing, vol. 60, no. 2, pp. 601-616, 2022.DOI
18 
Z. Deng, Y. Wang, L. Li, B. Zhang, Z. Zhang, L. Bian, Z. Ding, and C. Yang, ``An attention involved network stacked by dual-channel residual block for hyperspectral image classification,'' Infrared Physics & Technology, vol. 122, pp. 83-104, 2022.DOI

Author

Jin Ma
../../Resources/ieie/IEIESPC.2025.14.4.495/au1.png

Jin Ma is a male, undergraduate, Bachelor of Arts, lecturer, dual teacher, major in fine arts. Since 2006, he has been working in frontline education and student management at Jizhong Vocational College. He has successively undertaken the teaching work of professional courses such as ``Fundamentals of Fine Arts'', ``Handicraft'', ``Public Art'', ``Chinese Painting'', etc., and have guided students to participate in relevant professional competitions and won awards multiple times. Participated in textbook writing in August 2020 and served as the editor in chief of ``Practical Art''; As the deputy editor in chief of the professional textbook ``Introduction to Art''. He have made contributions to the compilation of the ``College Etiquette Course'' for vocational colleges during the 13th Five Year Plan period. He have participated in the provincial-level project ``Research on the Communication Methods and Channels of Traditional Music Culture in Vocational Education'', presided over the completion of the municipal level project ``Application Research of Ideological and Political Education in Art Education Teaching in Vocational Colleges'', and published multiple papers.

Wei Sun
../../Resources/ieie/IEIESPC.2025.14.4.495/au2.png

Wei Sun obtained her master's degree in educational management from Hebei Normal University in China in 2020, with the title of Associate Professor. Currently, she serves as the Deputy Director of the Education Department at Jizhong Vocational College in Hebei Province. She has been working at the forefront of teaching, constantly improving her teaching methods during the teaching process, and is deeply loved. Actively engaged in professional teaching and research work in addition to teaching, and have published articles in multiple domestic journals; Hosted the compilation of multiple textbooks, among which the editor in chief's ``Handmade Production'' was approved as a national 14th Five Year Plan textbook; Hosted and completed multiple teaching and research projects at or above the provincial level. Her areas of interest include toy design, handicrafts, labor education, and other aspects, and she has been steadily improving in her professional field.

Zhang Yu
../../Resources/ieie/IEIESPC.2025.14.4.495/au3.png

Zhang Yu is an Intermediate Double Teacher of Hebei Province. Since September 2012, she has been teaching at Jizhong Vocational College and currently leads the animation production technology major.She has served as a student management officer, teaching management officer, and student counselor. Up to now, she has participated in the compilation of three textbooks: In June 2015, she was the co-author of ``Color Composition''; In June 2020, she was appointed as the deputy editor in chief of the textbook ``Practical Art''; In July 2023, ``Research on the Construction of Practical Teaching System for Art and Design Majors'' was published as a co-author of the monograph. She also has multiple research achievements, and in August 2022, she published her work ``Framing'' as the second designer of a design patent; In October 2023, She Participated in the provincial-level project ``Research on the Dual Growth Model of Teachers and Students in Digital Intelligence and Technology Majors'', which has been completed; In April 2023, her paper ``Practice of 'Teacher-Student Mutual Growth Classroom' in the Digital Intelligence Era'' was published in the journal ``New Curriculum Teaching''.