Mobile QR Code QR CODE

2024

Acceptance Ratio

21%


  1. (School of Art and Design, Pingdingshan University, Pingdingshan, 467000, China)



Regional guidance, Enhanced network, YOLOv3 detection, Logo design, Convolutional neural classification network

1. Introduction

With the speed growth of the internet economy, e-commerce is becoming increasingly prosperous. Consumers mainly choose products by viewing product images and advertisements online, which makes the product logo a key element affecting the development of e-commerce economy. Among them, trademarks can provide consumers with a more intuitive understanding of a company's products, culture, and business philosophy. They can help customers choose the products they need and promote them [1]. Therefore, it is necessary to classify and detect the logos of trademarks to better apply them to fields such as logo design, piracy monitoring, advertising analysis, and product image search. The traditional logo classification and detection methods only focus on a single area of the image in a small dataset, resulting in poor performance. As the advancement of deep learning technology, deep convolutional neural networks (CNNs) have been broadly used in object detection and image classification. This network can train and extract effective logo image features to achieve more precise recognition and description of images. Mao K. J. et al. have found that recognizing logos from images poses certain difficulties. Therefore, they proposed a strong baseline method called Trinity Paolo, which combines attention mechanism, stripe pooling, and weighted box fusion into the state-of-the-art YOLOv4 framework for large-scale logo detection. The results show that Trinity YOLO has an average performance 3% higher than YOLOv4, which can solve problems such as lack of training data, multi-scale objects, and inconsistent bounding box regression. However, this method introduces complex techniques such as attention mechanism, stripe pooling, and weighted box fusion, which increases the complexity of the algorithm and is not suitable for ordinary users or scenarios that require rapid deployment [2]. Zhou L. et al. proposed a motion blur vehicle logo detection method that combines filtering deblurring and logo removal to address the problem of difficult detection of vehicle logos in complex traffic environments. At the same time, a method was proposed to further improve detection accuracy by utilizing VL-YOLO and outlier removal clustering algorithm. The results show that the method has good detection accuracy in motion blur environments. But this method has a longer processing time in handling motion blur [3]. On the basis of existing feature extraction and classification algorithms, this study transforms the classification of logo datasets into the classification of fine-grained images, and achieves multi classification of logo images through the judgment of major categories and subcategories. At the same time, research will be conducted to apply existing object detection methods to the target detection of trademark logos to achieve intelligent design of logos, product authentication, and other goals, with the aim of achieving good application results in this field.

2. Related Works

The detection and classification of logos play an important role in their design and application, and researchers in related fields have conducted extensive research on them using deep learning technology. Sahel S. et al. believed that logo detection could be applied to vehicle traffic monitoring and product copyright detection, so they introduced deep learning technology and constructed a CNN model. The findings showed that it could effectively improve the detection accuracy of logos and could be well applied in practical brand recognition and traffic monitoring [4]. Yousaf W. et al. proposed the Patch-CNN algorithm to address the issue of intra class changes in logo detection. This method mainly divided the logo image into small blocks for classification, and eliminated areas without logos through threshold. It was evident that the detection accuracy of this method was as high as 99.01%, and it had good practical application results [5]. JAIN R. K. designed a detection method based on a dual attention extended residual network to address the issues of complex background and varying proportions in logo detection. This method mainly used image level labels to replace annotation of bounding boxes, and introduced channel characteristic maps to assist in logo classification. The lab outcomes expressed that the detection accuracy of this method was improved by about 4% compared to conventional residual networks [6]. Yu Y.'s team proposed a cascaded deep CNN to address the issue of dependency in the recognition of car logo logos. This network could directly recognize vehicle logos without relying on license plates. The outcomes denoted that the recognition rate, detection rate, and overall performance of this method weer as high as 99.4%, 98.7%, and 98.1%, respectively, and it had good robustness [7]. Ranjith K. C. believed that a company's trademark or logo could convey the culture and nature of the product and the company. To ensure the authenticity of the trademark, a feature fusion method based on SURF and SIFT was proposed to identify the authenticity of the logo. The results indicated that this method could effectively distinguish the authenticity of enterprise trademarks and had good detection accuracy [8].

For large-scale logo detection, scholars such as Wang J. proposed a strong baseline method called Logo-YOLO, which incorporates focus loss and CIoU loss into the basic YOLOv3 framework. The results showed that compared with YOLOv3, the average performance of this method improved by about 4%, and compared with several deep detection models reported on LogoDet-3K, its performance improved even more [9]. In order to improve the accuracy of logo detection and classification, Network C. N. extended the original dataset and increased testing time to address dataset constraints. Meanwhile, create enhanced variants of the test images and merge their predicted data. The results show that the detection accuracy of the improved method is as high as 98%, the recall rate is 99%, and the F1 value is 98% [10]. Yue J et al. proposed using a lightweight model of CNN and transformer for vehicle logo localization to address the problems of low positioning accuracy and inaccurate detection in vehicle logo recognition. The results show that this method can withstand environmental changes and has achieved significant results in both positioning speed and accuracy [11]. Sun Y.'s team found it very difficult to design image classification based on customer preferences, so they introduced a method combining genetic algorithms for CNN architecture design to achieve image classification tasks. The results showed that this algorithm outperformed other classification algorithms in classification accuracy and resource consumption [12]. Zhang J. et al. proposed a CNN model based on attention surplus learning to address the insufficient training data in deep CNNs in image classification. This model mainly improved its discrimination and classification ability through attention learning and residual learning mechanisms, and the results showed that this method could greatly raise the classification accuracy and stability of images [13].

In summary, there has been a lot of research on the detection of logo images and certain results have been achieved. At the same time, many effective solutions have been achieved in image classification. However, existing classification and detection technologies have not taken into account issues such as the complexity and diversity of logo images and the imbalance of samples. Therefore, the study proposes to transform the classification of logo datasets into the classification of fine-grained images, and apply object detection methods to logo detection to achieve more intelligent logo design.

3. LOGO Design Based on Regional Guidance and Enhanced Network

3.1. LOGO Image Classification Based on Region Guided and Enhanced Networks

To achieve intelligent design of trademark logos, extracting and classifying features from image information are needed firstly. The traditional neural network classification model uses CNN, which mainly extracts features from various levels of the image through convolutional operations. CNN mainly optimizes network parameters and weights through forward and backward propagation of extremum values to make the output value infinitely close to the target value [14,15]. The specific implementation of CNN is shown in Fig. 1.

Fig. 1. The specific implementation of CNN.

../../Resources/ieie/IEIESPC.2025.14.4.431/fig1.png

With the increasing complexity and diversity of logo images, traditional CNN classification algorithms are no longer able to effectively extract robust image features. Therefore, fine-grained image classification technology has been introduced in the research. It is an important branch of image classification, characterized by subtle differences between subcategories, but significant differences within subcategories. Due to the similar characteristics of logo images, they can be considered as fine-grained image classification to be solved [16]. The dataset selected for the study is the Logo 2K+classification dataset, which features logo images with high shape class similarity and high background complexity. The study proposes a DRGE-Net based on CNN, which belongs to a self supervised training mechanism. This method locates LOGO regions with relatively large amounts of information, then strengthens the data according to the guidance of regional features, and finally uses data augmentation strategies to further strengthen the information region, thereby achieving more effective feature learning [17]. DRGE-Net consists of regional enhancer sub network, teacher sub network, guidance sub network and inspection sub network. Its model structure is shown in Fig. 2.

Fig. 2. DRGE-Net classification network model.

../../Resources/ieie/IEIESPC.2025.14.4.431/fig2.png

The role of the guided sub network in DRGE-Net is to calculate the information content of all predicted regions in the image to obtain the region with the highest information content. The specific implementation steps are to first give an input image X, and then guide the sub network to generate regions A with different proportions using convolutional layers, maximum pooling, and ReLU activation. Then it performs Non Maximum Suppression (NMS) on the region to reduce redundant regions and obtain the first M information regions. Finally, it feedbacks the obtained regions together into the teacher's sub network to obtain the most informative regions. The teacher subnetwork mainly calculates the confidence level by guiding the areas provided by the subnetwork. Generally, areas with high confidence level belong to areas with high probability of real category, and their representation is shown in Eq. (1).

(1)
$ \text{if $C(R_{1} )> C(R_{2} )$, then $I(R_{1} )>I(R_{2} )$, $R_{1}$, $R_{2} \in A$}. $

In Eq. (1), $R$ represents the candidate region; $C$ refers to the confidence level of the region; $I$ represents the amount of information corresponding to each region. The teacher sub network mainly optimizes the detected regions and unifies the confidence and information order of the regions through ranking loss. The ranking loss function is calculated as shown in Eq. (2).

(2)
$ Loss_{I} (I,C)=\sum_{(a,b):C_{a} <C_{b} } f(I_{b} -I_{a} ) . $

In Eq. (2), $a$ and $b$ represent the index of the region. The calculation of the $f$ function is shown in Eq. (3).

(3)
$ f(x)=\max \left\{0,~1-x\right\} . $

Then the teacher sub network performs optimization processing through the loss function to minimize the loss difference between the probability of the complete image and the classification probability of the region. Its calculation is shown in Eq. (4).

(4)
$ Loss_{C} =-\sum {}_{a=1}^{M} \log C(R_{a} ) -\log C(X) . $

Under the interaction of the guidance sub network and the teacher sub network, it can obtain $J$ regions that are most closely related to the image information, then uses the enhancer sub network to enhance these $J$ regions, and uses region clipping and discarding operations to obtain finer granularity LOGO regions. The mapping calculation of the enhanced feature map is shown in Eq. (5).

(5)
$ R_{J} *=\frac{R_{J} -\min (R_{J} )}{\max (R_{J} )-\min (R_{J} )} . $

Region clipping extracts local features with a larger amount of information by amplifying the logo region, and then it determines whether it belongs to the foreground or background. If it is detected as the background, it is cropped. Region discarding mainly involves hiding the background region, as shown in Eq. (6).

(6)
$ C_{J} (a,b)=\left\{\begin{aligned} &0,&&\text{otherwise}, \\ &1,&&\text{if }\frac{\theta _{c} }{\theta _{d} } <R_{J} *(a,b). \end{aligned}\right. $

In Eq. (6), $\theta _{c} $ and $\theta _{d} $ are both set thresholds. The calculation of cross entropy loss is shown in Eq. (7).

(7)
$ Loss_{t} =-\sum {}_{a=1}^{J} \log E(R_{J} *) -\log E(X) . $

In Eq. (7), $E$ is the probability that the enhancement area maps to the real category label. Through the synergy of guidance sub network, teacher sub network and regional enhancer sub network, the most relevant characteristic area with the largest amount of information will be obtained, which can help the inspection sub network to make decisions. The specific steps are to check the sub network and first perform global and regional feature extraction and fusion on $J$ enhanced information regions. Next, it connects the input image feature vectors with the enhanced feature vectors, and inputs them into the classification layer and convolution layer. Finally, it checks the sub network to gain the final prediction result, which is expressed in Eq. (8).

(8)
$ P=F(X,~R_{1}^{*},~R_{2}^{*},~...,~R_{K}^{*}). $

In Eq. (8), $F$ represents a function transformation. After completing the loss calculation of the four sub networks, DRGE-Net then uses the random gradient descent method for joint loss calculation, as shown in Eq. (9).

(9)
$ Loss_{total} =Loss_{I} +\alpha \times Loss_{C} +\beta \times Loss_{t} . $

In Eq. (9), $\alpha $ and $\beta $ are hyperparameter. Through the synergistic effect of four sub networks, it is possible to obtain the regions with the most relevant information and perform enhancement operations on them, thereby achieving more accurate and effective logo image classification tasks. The detailed network structure of DRGE-Net is displayed in Fig. 3

Fig. 3. DRGE-Net detailed network structure.

../../Resources/ieie/IEIESPC.2025.14.4.431/fig3.png

3.2. LOGO Detection Based on YOLOv3 Improved Loss Function

To better achieve the positioning and classification of target logos, it is necessary to detect key information in the target area, determine the boundary box of the target logo, and provide its category. The current detection model has improved in accuracy, but its detection speed has significantly decreased. Moreover, due to the presence of many distorted and occluded targets and complex backgrounds in the image of the logo, it is difficult to accurately recognize and detect the target logo [18,19]. Therefore, the YOLOv3 detection algorithm was introduced in the study. The basic YOLO network consists of 2 FC layers and 24 convolutional layers. It mainly predicts the bounding box through the top-level feature map and estimates the probability of different categories. YOLOv3 uses the DarkNet-53 network for feature extraction and multi-scale feature extraction through three networks, while also introducing a dual dimensional attention mechanism. The usage of residual structure to extract features of different scales can ensure the convergence of the deep network and avoid the occurrence of overfitting [20]. YOLOv3 not only ensures the distinguishability of features, but also effectively achieves real-time detection from image to target classification and regression. The specific structure diagram is shown in Fig. 4. The feature pyramid will match the feature information of 9 detection boxes to 3 feature maps of different sizes. The $38 * 38$ size corresponds to a large target box, the 76 * 76 size corresponds to a medium target box, and the $152 * 152$ size corresponds to a small target box.

The logo images in the LogoDet-3K dataset have characteristics such as high background complexity, varying shapes and sizes of the logo area, and the presence of distortion and occlusion. A logo detection method called Logo YOLO has been developed to achieve faster and more accurate detection of target logos [21]. The specific operation steps of this method are to first recalculate the anchor size of the LogoDet-3K dataset using the K-means clustering algorithm to improve the output scale of the network. Then, the effective features in the logo image are collected through the DarkNet network, and the residual module is used to remove excess information. Then, the feature pyramid is used to fuse multi-scale features and detect them at three scales. Finally, the classification loss function is carried out to reduce the negative impact of difficult and easy samples [22]. The detection framework of Logo YOLO is shown in Fig. 5.

Fig. 4. Specific structural diagram of YOLOv3.

../../Resources/ieie/IEIESPC.2025.14.4.431/fig4.png

Fig. 5. Logo YOLO's detection framework.

../../Resources/ieie/IEIESPC.2025.14.4.431/fig5.png

Anchor boxes represent candidate boxes with fixed height and width. The original YOLOv3 can cluster the COCO dataset to generate 9 anchor boxes and output at 3 scales. Among them, the 13?13 scale output has the largest Receptive field, so it is used to detect large targets; 26?26 scale output is for detecting medium-sized targets; the 52 ?52 scale output should be used for detecting small targets. However, due to the fact that the proportion size of the original YOLOv3 network is no longer suitable for the detection of the LogoDet-3K dataset, the K-means clustering algorithm is used for clustering analysis of the target bounding boxes in LogoDet-3K, and the average overlap is used as the evaluation indicator for the result analysis. The objective function representation of the average folding degree is shown in Eq. (10).

(10)
$ f=\arg \max \frac{\sum _{i=1}^{k}\sum _{j=1}^{N_{k} }I_{IoU} (B,C) }{N} . $

In Eq. (10), $N$ is the total number of samples; $k$ means the number of clusters; $B$ represents a rectangular box of real samples; $C$ represents the clustering result of K-means. YOLOv3 mainly extracts image features through the Darknet-53 network, which introduces a residual module that can effectively control the propagation of gradients, thereby reducing the difficulty of training deep networks. YOLOv3 mainly utilizes 53 convolutional layers in the Darknet-53 network to achieve feature extraction, multi-scale feature fusion, and detection. Among them, the final number of classification categories obtained is the number of convolutional kernels in the last convolutional layer. In addition, the loss function of target detection task includes regression loss and classification loss. The original loss function of YOLOv3 mainly includes loss of box coordinate, category, confidence and width height, and its expression is shown in Eq. (11).

(11)
$ Loss(object)\!=\!Loss_{xy} \!+\!Loss_{class} \!+\!Loss_{conf} \!+\!Loss_{wh} . $

In Eq. (11), $Loss_{xy} $, $Loss_{class} $, $Loss_{conf} $, and $Loss_{conf} $ respectively represent bbox coordinate loss, category loss, confidence loss, and width height loss. Due to the imbalance of samples, the loss function during training is very small, which reduces the detection accuracy. To solve this problem, Focal loss is introduced based on the original loss function, and its calculation is shown in Eq. (12).

(12)
$ Focal loss=\left\{\begin{aligned} & -(1-\omega )y'\lambda \cdot \log (1-y'),&&y=0,\\ & -\omega (1-y')\lambda \log y', && y=1. \end{aligned}\right. $

In Eq. (12), $y$ represents the true category; $y'$ refers to the model probability estimated by the activation function; $\omega $ represents the hyperparameter of balanced positive and negative samples; $\lambda $ represents the hyperparameter of the adjustment difficulty sample, which represents the attenuation degree of the sample loss. In addition, the Intersection over Union (IoU) is an indicator for detecting the accuracy of real and predicted boxes, which can characterize the degree of loss in target localization. Its calculation is shown in Eq. (13).

(13)
$ IoU=\frac{B^{pd} \cap B_{{}^{gt} } }{B^{pd} \cup B^{gt} } . $

In Eq. (13), $B_{gt} $ denotes the target box; $B^{pd} $ is a prediction box. Although IoU solves the problem of variables not having scale invariance, it only acts at the intersection of bounding boxes. On the basis of IoU, the minimum bounding rectangle of real box and prediction box is designed to realize gradient optimization of disjoint rectangular box, but its convergence speed is still limited. Therefore, CIoU loss is introduced on the basis of YOLOv3 loss function to achieve effective regression of prediction box, and its calculation is shown in Eq. (14).

(14)
$ L_{CIoU} =1+R_{CIoU} (B^{pd} ,B_{{}^{gt} } )-IoU . $

In Eq. (14), $R_{CIoU} $ represents the penalty term of the target box and the prediction box, and its calculation is shown in Eq. (15).

(15)
$ R_{CIoU} =\mu \upsilon +\frac{\varphi ^{2} (e,e^{gt} )}{h^{2} } . $

In Eq. (15), $e^{gt} $ and $Loss_{conf} $ represent the center points of the target box and the prediction box, respectively; $\varphi (\cdot )$ represents the Euclidean distance function between two points; $\mu \upsilon $ refers to the aspect ratio parameter. The modeling diagram of CIoU losses is shown in Fig. 6. In Fig. 6, the blue box represents the real box, the purple box represents the predicted box, and the red box represents the smallest box containing both. $h$ represents the diagonal distance between two boxes, and $d$ represents the distance between the center points of the two boxes.

Fig. 6. CIoU loss modeling diagram.

../../Resources/ieie/IEIESPC.2025.14.4.431/fig6.png

4. LOGO Design and Application Analysis Based on Regional Guidance and Enhanced Network

4.1. Analysis of DRGE Net Classification Results

To prove the effectiveness of the DRGE-Net method, the study selected AlexNet, GoogLeNet, and VGGNet-16 classification methods for comparison of classification results. The feature extractor used in the experiment was ResNet-152, the region clipping and discarding threshold was set to 0.5, and the hyperparameter weight was set to 1. In addition, the evaluation indicator for classification effectiveness was accuracy, and the results were compared using the accuracy of the first and top 6 confidence levels. The classification results of different methods on the Logo 2K+and WebLogo 2M datasets are shown in Fig. 7. The Top-1 classification accuracy and Top-6 classification accuracy of DRGE-Net were the highest on both datasets. Among them, on the Logo 2K+dataset, the Top-1 classification accuracy of DRGE-Net was as high as 72.12%, and the Top-6 classification accuracy was as high as 94.86%, which was 23.31% and 16.61% higher than AlexNet, respectively. Meanwhile, compared to VGGNet-16, the Top-1 and Top-6 classification accuracy of DRGE-Net have improved by 9.98% and 5.81%, respectively. In addition, on the WebLogo 2M dataset, the Top-1 classification accuracy of DRGE-Net was as high as 64.89%, and the Top-6 classification accuracy was as high as 87.53%, which was 2.79% and 7.37% higher than GoogLeNe, respectively. The DRGE-Net, which adopted regional guidance and enhanced network, had higher classification accuracy.

Due to the imbalance in the number of image samples tested, further research was conducted to test the stability of the DRGE-Net method. Four different algorithms, DRGE-Net, AlexNet, GoogLeNet, and VGGNet-16, were used to draw receiver operating characteristic curves (ROCs). Among them, the area below the ROC curve represented the stability performance of the method, and the larger the area, the stronger the stability. The ROC classification curves of different algorithms are shown in Fig. 8. Compared to the other methods, DRGE-Net had the largest area under the curve, followed by GoogLeNet. As the amount of image categories increased, the true positive rate of DRGE-Net could still maintain a high level and had strong stability performance.

Fig. 7. Classification accuracy on different datasets.

../../Resources/ieie/IEIESPC.2025.14.4.431/fig7.png

Fig. 8. ROC classification curves of different algorithms.

../../Resources/ieie/IEIESPC.2025.14.4.431/fig8.png

The study continued to validate the classification effectiveness of the DRGE-Net method, selecting four best performing logo categories and four worst performing logo categories for comparison of classification results. The Top-1 classification accuracy results of four different algorithms are shown in Fig. 9. From Fig. 9, the DRGE-Net method achieved the optimal classification accuracy in all 8 LOGO categories. Among the four best performing logo categories, the classification accuracy of DRGE-Net was above 80%. Meanwhile, in the Cristalp category, DRGE-Net achieved the highest classification accuracy of 92.68%, which was 6.54% higher than VGGNet-16. In addition, among the four worst performing logo categories, the classification accuracy of DRGE-Net was higher than 50%, with a maximum of 60.8%, an improvement of 8.82% compared to VGGNet-16. LOGO images with poor classification results often contained complex backgrounds and small logo regions, but the DRGE-Net method could still achieve good classification results, indicating its good classification stability.

Fig. 9. Top-1 classification accuracy results for different categories.

../../Resources/ieie/IEIESPC.2025.14.4.431/fig9.png

4.2. Analysis of Logo YOLO Test Results

The study analyzed the performance of the Logo YOLO detection method and selected the LogoDet-3K and large category datasets as the experimental datasets. Among them, the LogoDet-3K dataset mainly included three categories of sub datasets, and the large category datasets were divided into three common categories: clothing, food, and daily necessities. The statistical information is shown in Table 1.

Table 1. LogoDet-3K and statistical iInformation of large category datasets.

Data set

Category

Image

Target

Training set

Test set

LogoDet-3K-1000

1000

85641

101246

75784

11364

LogoDet-3K-2000

2000

116547

136462

10354

13570

LogoDet-3K-3000

3000

158369

194543

14244

16549

Clothes

608

31425

37614

27346

3549

Food

964

53248

64324

47154

6085

Necessities

431

24834

30246

22064

2837

The study selected SSD, Faster R-CNN, and YOLOv3 for comparison of experimental results. Among them, the backbone network of SSD was VGGNet-16, the backbone network of Faster R-CNN was ResNet101, and the backbone network of YOLOv3 and Logo YOLO was Darknet-53. In addition, the detection and evaluation index used in the experiment is mAP, with a total of 9 anchor boxes. The detection results of different detection methods on the LogoDet-3K dataset and the large category dataset are shown in Fig. 10. As shown in Fig. 10(a), the mAP values of Logo YOLO on the three sub datasets of LogoDet-3K were higher than those of the other three methods. Among them, in the LogoDet-3K-1000 dataset, the mAP value of Logo YOLO reached the highest value of 58.89%, which increased by 15.68% and 13.71% compared to SSD and Faster R-CNN, respectively. Meanwhile, in the LogoDet-3K-2000 dataset, the mAP value of Logo YOLO reached 56.77%, an increase of 18.49% and 4.45% compared to SSD and YOLOv3. In addition, in the LogoDet-3K-3000 dataset, the mAP value of Logo YOLO reached 52.28%, which increased by 18.26%, 13.71%, and 3.37% compared to the other three algorithms, respectively. From Fig. 10(b), the Logo YOLO detection method had the best mAP values in the three major categories of clothing, food, and essential goods, with values of 61.58%, 56.42%, and 61.97%, respectively. Among them, on the dataset of clothing categories, the mAP value of Logo YOLO increased by 12.5%, 5.16%, and 4.57% compared to the other three detection methods, respectively. Meanwhile, on the food category dataset, the mAP values of Logo YOLO increased by 9.37% and 3.28% compared to Faster R-CNN and YOLOv3, respectively. In addition, on the dataset of essential goods categories, the mAP value of Logo YOLO increased by 12.32% compared to SSD and 4.28% compared to YOLOv3, respectively. This indicated that the detection performance of the Logo YOLO method had significant advantages and could be well applied in practical logo detection and design.

Further testing will be conducted on the detection accuracy and missed detection rate of the Logo YOLO method. The experiment used the precision and recall rate (PR) curve to evaluate the detection performance. The larger area under the PR curve denotes better detection performance. The PR curves of Logo YOLO and YOLOv3 detection methods on the LogoDet-3K dataset are expressed in Fig. 11. From Fig. 11, in the three sub datasets of the LogoDet-3K dataset, the area enclosed by the PR curve of Logo YOLO was greater than YOLOv3. The Logo YOLO detection method had a higher recall rate and detection accuracy, and was not easily affected by small issues such as occlusion distortion.

Fig. 10. Detection results on different datasets.

../../Resources/ieie/IEIESPC.2025.14.4.431/fig10.png

Fig. 11. PR curves of different algorithms on the LogoDet-3K dataset.

../../Resources/ieie/IEIESPC.2025.14.4.431/fig11.png

At the end of the study, the Logo-YOLO method was compared with several advanced LOGO detection methods, including You Only Look Once version 2+Context aware Loss (YOLOv2+CAL), scaled YOLOv4, and improved YOLOv4 [23]. The mAP values for each LOGO detection method are shown in Table 2. According to Table 2, the detection mAP value of the Logo-YOLO method proposed by the research institute is as high as 53.26%, which is 3.78%, 2087%, and 1.35% higher than YOLOv2+CAL, Scaling YOLOv4, and the improved YOLOv4, respectively. Compared to other detection methods, the Logo YOLO method has higher detection accuracy. This method utilizes a feature pyramid network structure to effectively fuse multi-scale features, thereby improving the robustness and accuracy of detection at different scales, effectively solving the limitations of traditional Logo detection methods in complex scenes.

Table 2. The mAP values of different logo detection methods.

Method

mAP (%)

YOLOv2+CAL

49.48

Scaling YOLOv4

50.39

Improved YOLOv4

51.91

Logo-Yolo

53.26

5. Conclusion

With the speed growth of the Internet era, the application of logos in modern life is becoming increasingly widespread. To achieve more intelligent logo design, the DRGE-Net method was introduced for feature extraction and classification of images. At the same time, a Logo YOLO detection method was proposed to address the complex background and imbalanced samples of logo images. By calculating the CIoU loss, more accurate regression outcomes were obtained. The results showed that in the performance experiment of validating the logo classification method, DRGE-Net achieved a Top-1 classification accuracy of 72.12% and a Top-6 classification accuracy of 94.86% on the Logo 2K+dataset, which were improved by 23.31% and 16.61% compared to AlexNet, respectively. On the WebLogo 2M dataset, the Top 1 classification accuracy of DRGE-Net was as high as 64.89%, and the Top 6 classification accuracy was as high as 87.53%, which was 2.79% and 7.37% higher than GoogLeNet, respectively. Meanwhile, the area below the RC curve of DRGE-Net was the largest, indicating its highest stability. In the experiment to verify the effectiveness of the logo detection method, the mAP value of Logo YOLO in the LogoDet-3K-1000 dataset reached the highest value of 58.89%, which was 15.68% and 13.71% higher than SSD and faster R-CNN, respectively. Meanwhile, the mAP value of Logo YOLO in the LogoDet-3K-2000 dataset was as high as 56.77%, an increase of 18.49% and 4.45% compared to SSD and YOLOv3. In addition, in the LogoDet-3K-3000 dataset, the mAP value of Logo YOLO reached 52.28%, which increased by 18.26%, 13.71%, and 3.37% compared to the other three algorithms, respectively. The DRGE-Net and Logo YOLO methods have significant performance advantages in the classification and detection of logo images, providing effective technical support for the intelligent design of logos. But the research adopts a supervised classification approach, so it can be further explored in an unsupervised direction to achieve more comprehensive classification tasks.

REFERENCES

1 
Z. N. Abdullah, Z. A. Abutiheen, A. A. Abdulmunem, and Z. A. Harjan, ``Official logo recognition based on multilayer convolutional neural network model,'' TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 20, no. 5, pp. 1083-1090, 2022.DOI
2 
K. J. Mao, R. H. Jin, K. Y. Chen, J. Mao, and G. Dai, ``Trinity‐YOLO: High‐precision logo detection in the real world,'' IET Image Processing, vol. 17, no. 7, pp. 2272-2283, 2023.DOI
3 
L. Zhou, W. Min, D. Lin, Q. Han, and R. Liu, ``Detecting motion blurred vehicle logo in IoV using filter-DeblurGAN and VL-YOLO,'' IEEE Transactions on Vehicular Technology, vol. 69, no. 4, pp. 3604-3614, 2020.DOI
4 
S. Sahel, M. Alsahafi, M. Alghamdi, and T. Alsubait, ``Logo detection using deep learning with pretrained CNN models,'' Engineering, Technology, and Applied Science Research, vol. 11, no. 1, pp. 6724-6729, 2021.DOI
5 
W. Yousaf, A. Umar, S. H. Shirazi, Z. Khan, I. Razzak, and M. Zaka, ``Patch-CNN: Deep learning for logo detection and brand recognition,'' Journal of Intelligent, and Fuzzy Systems, vol. 40, no. 3, pp. 3849-3862, 2021.DOI
6 
R. K. Jain, Y. Iwamoto, T. Watasue, T. Nakagawa, T. Sato, X. Ruan, and Y. W. Chen, ``Weakly supervised logo detection using a dual-attention dilated residual network,'' IIEEJ Transactions on Image Electronics and Visual Computing, vol. 9, no. 1, pp. 12-19, 2021.DOI
7 
Y. Yu, H. Guan, D. Li, and C. Yu, ``A cascaded deep convolutional network for vehicle logo recognition from frontal and rear images of vehicles,'' IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp. 758-771, 2019.DOI
8 
K. C. Ranjith, ``Identification of fake vs original logos using deep learning,'' Turkish Journal of Computer and Mathematics Education (TURCOMAT), vol. 12, no. 12, pp. 3770-3780, 2021.DOI
9 
J. Wang, W. Min, S. Hou, S. Ma, Y. Zheng, and S. Jiang, ``Logodet-3k: A large-scale image dataset for logo detection,'' ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 18, no. 1, pp. 1-19, 2022.DOI
10 
C. N. Network, ``Data augmentation using test-time augmentation on convolutional neural network-based brand logo trademark detection,'' Indonesian Journal of Artificial Intelligence and Data Mining (IJAIDM), vol. 7, no. 2, pp. 266-274, 2024.DOI
11 
J. Yue, J. Fu, and C. Yang, ``Expressway vehicle logo detection: A lightweight CNN and logo localization method,'' Journal of Electronic Imaging, vol. 33, no. 2, %): 023035-023035 2024.DOI
12 
Y. Sun, B. Xue, M. Zhang, G. G. Yen, and J. Lv, ``Automatically designing CNN architectures using the genetic algorithm for image classification,'' IEEE Transactions on Cybernetics, vol. 50, no. 9, pp. 3840-3854, 2020.DOI
13 
J. Zhang, Y. Xie, Y. Xia, and C. Shen, ``Attention residual learning for skin lesion classification,'' IEEE Transactions on Medical Imaging, vol. 38, no. 9, pp. 2092-2103, 2019.DOI
14 
M. Murinto and M. Rosyda, ``Logarithm decreasing inertia weight particle swarm optimization algorithms for convolutional neural network,'' JUITA: Jurnal Informatika, vol. 10, no. 1, pp. 99-105, 2022.DOI
15 
V. Monga, Y. Li, and Y. C. Eldar, ``Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing,'' IEEE Signal Processing Magazine, vol. 38, no. 2, pp. 18-44, 2021.DOI
16 
Y. Sun, B. Xue, M. Zhang, and G. G. Yen, ``Evolving deep convolutional neural networks for image classification,'' IEEE Transactions on Evolutionary Computation, vol. 24, no. 2, pp. 394-407, 2019.DOI
17 
J. Yu, M. Tan, H. Zhang, Y. Rui, and D. Tao, ``Hierarchical deep click feature prediction for fine-grained image recognition,'' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 2, pp. 563-578, 2019.DOI
18 
D. Hong, L. Gao, N. Yokoya, J. Yao, J. Chanussot, Q. Du, and B. Zhang, ``More diverse means better: Multimodal deep learning meets remote-sensing imagery classification,'' IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 5, pp. 4340-4354, 2020.DOI
19 
A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, and V. Ferrari, ``The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale,'' International Journal of Computer Vision, vol. 128, no. 7, pp. 1956-1981, 2020.DOI
20 
S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz, and D. Terzopoulos, ``Image segmentation using deep learning: A survey,'' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3523-3542, 2021.DOI
21 
B. Charbuty and A. Abdulazeez, ``Classification based on decision tree algorithm for machine learning,'' Journal of Applied Science and Technology Trends, vol. 2, no. 1, pp. 20-28, 2021.DOI
22 
S. Berg, D. Kutra, T. Kroeger, C. N. Straehle, B. X. Kausler, C. Haubold, and A. Kreshuk, ``Ilastik: Interactive machine learning for (bio) image analysis,'' Nature Methods, vol. 16, no. 12, pp. 1226-1232, 2019.DOI
23 
S. Hou, J. Li, W. Min, Q. Hou, Y. Zhao, Y. Zheng, and S. Jiang, ``Deep learning for logo detection: A survey,'' ACM Transactions on Multimedia Computing, Communications and Applications, vol. 20, no. 3, pp. 1-23, 2023.DOI

Author

Zemei Liu
../../Resources/ieie/IEIESPC.2025.14.4.431/au1.png

Zemei Liu received a master's degree in design (2008) from Hubei University of Technology, and is currently an associate professor at the School of Art and Design of Pingdingshan University, and has published research results in several journals. Her research areas include visual communication design, brand image design, logo design, etc.