Mobile QR Code QR CODE

2024

Acceptance Ratio

21%


  1. (Arts & Design College, Putian University, Putian, 351100, China)



Fast robust features, Mazu, Pattern elements, Feature extraction, Perceptual hashing

1. Introduction

In the current design field, especially in the design of B&B space, the feature extraction and matching of pattern elements play a key role in achieving accurate design and communication of cultural elements. As a culturally rich and meaningful design element, the application of Mazu pattern in the design of homestay space is particularly prominent [1,2]. However, accurately extracting and matching the features of these pattern elements is a challenging task under the existing state of the art. Especially under complex conditions such as image rotation and scale transformation, traditional manual feature extraction and matching methods are often difficult to achieve efficient and accurate results. Although the manual local descriptor method is effective to a certain extent, it has limitations in dealing with complex deformations and environmental changes [3,4,5]. Similarly, methods that rely on deep learning, while capable of processing large-scale data, are still lacking in terms of adaptability to specific tasks such as rotation and scale changes [6]. Therefore, an effective method based on Speeded-Up Robust Features (SURF) algorithm is proposed to solve this problem. Firstly, an image-aware hash model is constructed and the SURF algorithm is integrated to improve the model's adaptability to image rotation and scale changes. In addition, the stability and accuracy of the model when dealing with complex image transformations are further enhanced by introducing the Siamese network model with the improved SURF algorithm. The overall structure of the study consists of four parts: the first part summarizes the relevant research results and shortcomings of SURF and feature extraction at home and abroad, the second part introduces in detail the construction process of pattern element feature extraction matching and image perception hash model, and the integration of SURF algorithm and Siamese network model, the third part compares and analyzes the research experiments through the proposed algorithm, and the fourth part summarizes the experimental results, points out the shortcomings of the research, and proposes future research directions.

2. Related Works

In the field of image processing, the SURF algorithm plays an important role in feature extraction and matching tasks. The SURF algorithm is widely used due to its efficient feature detection ability, and deep learning, especially convolutional neural networks, have shown great potential in automatically learning complex image features [7]. Here are some of the relevant studies by scientists and scholars. Cho E et al. proposed a dynamic improvement algorithm for high-definition image pyramids, which completely removed the redundancy of key point detection without losing the original function of SURF. Compared with the key point detection in the original SURF algorithm, the proposed method reduced memory usage by 45%-51%, execution time by 37%-50%, and power consumption by 18%-42% [8]. The research by Yasin A S M et al. proposed a unique approach that allowed autonomous robots to locate themselves in a specific area based solely on a pre-given set of images, without the need for external assistance. This method involved building an image database and storing it in the robot's memory to achieve fast image matching. In the performance evaluation, the SURF algorithm outperformed the other two algorithms in terms of accuracy and runtime [9]. Nawaz S A et al. proposed an image zero watermark algorithm based on SURF-DCT perceptual hashing, which first used the SURF features of the image for watermark embedding and extraction. Then, a hash sequence was generated through perceptual hashing and quantization processing to capture semi-global geometric features. Experimental results showed that the algorithm performed well in terms of robustness [10]. An Q et al. developed a power-function weighted image stitching method that integrated SURF optimization and acceleration units. The approach involved preliminary feature point screening using cosine similarity, followed by fine matching with the MSAC algorithm, and weight calculation of center points using the power function weighted fusion algorithm. Results indicated an approximately 11% improvement in matching accuracy compared to the traditional SURF algorithm, along with a reduction in matching time by around 1.6 seconds [11].

Radha R et al. proposed that the key to a content-based image retrieval system was to automatically index, search, retrieve, and browse images in a database. In this study, the performance of well-known feature extraction techniques such as SIFT, SURF, directional FAST and rotary BRIEF in sketch and painting-based image retrieval was discussed. The experimental results identified the most suitable extraction techniques and highlighted the importance of SIFT, SURF, and ORB features [12]. Xu H et al. introduced a static gesture recognition method that combines an enhanced SURF algorithm with a Bayesian regularized BP neural network. The approach involves extracting gesture features using the improved SURF algorithm and employing the Bayesian regularized BP neural network for classification. In addition, a classification method based on finger angle was introduced, and finally the final recognition was obtained by combining the two results. The results showed that the proposed method performed well in terms of robustness and accuracy of gesture recognition [13]. Yohannes Y et al. proposed two stages of SURF feature formation: point-of-interest detection, which included integrated images and detection based on Hessian matrices, and point-of-interest description, which involved orientation assignment and description subgeneration. Applying these features to random forest classification, 345 images of the Palembang Songkit pattern were tested. The test results showed that the overall accuracy was 68.89%, the accuracy rate of each class was 79.26%, and the precision and recall rates were 69.27% and 68.89%, respectively [14]. Bansal M et al. compared and evaluated the performance of the Shi-Tomasi corner detector with the SIFT and SURF feature descriptors. In order to speed up the processing, the local projection method was used to reduce the amount of feature calculation. The extracted features were classified by K-NN, decision tree and random forest classifier, and experiments were carried out using the Caltech-101 image dataset. The results showed that the accuracy of the random forest, decision tree, and K-NN classifier combined with Shi-Tomasi, SIFT, and SURF features was 85.9%, 80.8%, and 74.8%, respectively [15].

In summary, although previous studies have made progress in SURF algorithm and neural network, they are limited in revealing the high-level semantic features of images, and the time complexity in image feature extraction and matching retrieval applications is high. In this study, the improved SURF algorithm and the Siamese network model are fused to improve the anti-rotation and scale transformation performance, and to improve the accuracy of feature extraction and match retrieval.

3. Methods

In this study, the construction of pattern element feature extraction matching and image-aware hash model is proposed. On this basis, a perceptual hash model combined with SURF algorithm is further introduced. In order to enhance the rotation and translation invariance, a Siamese network model with improved SURF algorithm is proposed.

3.1 Pattern element feature extraction and matching and image perception hash model construction

When studying the feature extraction of Mazu patterns in the design of homestay spaces, the SURF algorithm was used to extract Mazu features. Compared to other feature extraction algorithms, SURF algorithm can reduce computational complexity while maintaining the stable performance of SIFT algorithm. At the same time, the SURF algorithm has good robustness and a higher feature point recognition rate than SIFT. It is generally superior to SIFT in situations such as viewpoint, lighting, and scale changes. The research first focuses on the identification and classification of key areas in the image, especially when the object is moved, distorted, and occluded. Compared with global features, local features perform better at adapting to these changes, effectively reducing mismatches. The process of local feature extraction involves key point detection, feature description, matching and classification. The detection of key points relies on the definition of salient points, and their descriptions are based on the characteristics of proximity pixels. In addition, the Gaussian difference operator is proposed, which is more efficient than the Hessian-Laplace operator in processing the local structure of the image, as shown in Eq. (1).

(1)
$ \frac{\partial G}{\partial \sigma } =\sigma \nabla ^{2} G . $

In Eq. (1), the expression on the left can be approximated as shown in Eq. (2).

(2)
$\begin{align} \left\{\begin{aligned} & \frac{\partial G}{\partial \sigma } \approx \frac{G(x,k\sigma )-G(x,\sigma )}{(k-1)\sigma },\\ & (k-1)\sigma ^{2} \nabla ^{2} G\approx G(x,k\sigma )-G(x,\sigma ). \end{aligned}\right.\end{align} $

In the field of feature point detection, both Harris-Affine and Hessian-Affine operators are derived from Harris-Laplace algorithms. These two operators improve the isotropy limit of the original algorithm and introduce the invariance of affine transformation. They enhance the recognition and retrieval capabilities of images by iteratively adjusting the position and scale of feature points. In contrast, the MSER operator achieves affine invariance through binarization [16]. The stable extreme regions identified by MSER are defined in Eq. (3).

(3)
$ q(i)=\frac{|Q_{i+\nabla } |-\left|Q_{i-\nabla } \right|}{|Q_{i} |}. $

In the face of huge image datasets, traditional data processing methods are no longer suitable. In this study, a Perceptual Hashing (PH) model is proposed, which can convert key image content into unique binary sequences, thereby simplifying the storage and management of images, especially in the field of image search, which significantly improves query efficiency [17,18]. The perceptual hash model is shown in Fig. 1.

Fig. 1 shows that the perceptual hash model resembles the digitization of multimedia content, creating a one-way connection as a unique feature of multimedia content and ensuring the security and robustness of the technology, the feature volume of datasets A1, A2, and A3 is significantly reduced after perceptual hashing, while these processed features still carry the key information of the original data. The perceptual hashing algorithm can effectively convert large-scale data objects into smaller binary formats, and this conversion maintains a certain consistency among similar data objects. The core of perceptual hashing technology, i.e., its mapping mechanism, can be defined in Eq. (4).

(4)
$ h=PH(I) . $

In Eq. (4), $I$ represents the input data and $h$ represents the result of the mapping; $PH$ stands for Perceived Hash Model. In image retrieval, the hash encoding of the image is stored in the database, and then similar images are found by comparing the hash value of the image to be retrieved with the value in the database. Comparing hashes often uses Hamming distance, which is one of the standard methods used in machine learning to measure the similarity of two pieces of data.

Fig. 1. PH mapping model structure diagram.

../../Resources/ieie/IEIESPC.2025.14.2.268/image1.png

3.2 Feature extraction of pattern elements based on SURF algorithm

In the design of B&B space, the key to feature extraction of Mazu patterns lies in the use of effective image analysis methods. Traditionally, image retrieval has relied on perceptual hashing techniques, including grayscale basis thresholds, frequency thresholds based on discrete cosine transforms, and multi-dimensional global feature methods. These are mainly focused on the global nature of the image, with low sensitivity to local details. On the contrary, the SURF algorithm uses Hessian matrix to identify local extremum, which improves the accuracy of feature extraction. Although SURF lacks real-time performance, its accuracy is remarkable. In this study, based on the feature extraction and matching of pattern elements and the image perception hash model, a perceptual hash model combined with the SURF algorithm is further proposed, which aims to improve the retrieval efficiency and accuracy, and its core principle is shown in Eq. (5).

(5)
$ H(\varepsilon ,\partial )=\left[\!\!\begin{array}{cc} {L_{xx} (\varepsilon ,\partial )} & {L_{xy} (\varepsilon ,\partial )} \\ {L_{xy} (\varepsilon ,\partial )} & {L_{yy} (\varepsilon ,\partial )} \end{array}\!\!\right] . $

In Eq. (5), the derivative convolution of the point $\varepsilon $ and Gaussian equations is represented as $L_{xx} (\varepsilon ,\partial )$. In the specific application, this expression is approximated because the Gaussian equation needs to be discretized, as shown in Eq. (6).

(6)
$ Det(H_{approx} )=D_{xx} D_{yy} -(0.9D_{xy} )^{2} . $

In Eq. (6), $D_{xx} $ and $D_{yy} $ are approximate to the box filters of $L_{xx} $ and $L_{yy} $, respectively. This approximation makes the recognition of point $\varepsilon $ as local extremum points more efficient, especially in the case of accelerated computation with integrated images. The SURF algorithm establishes the scale space by adjusting the size of the filter, and selects the extreme points as the feature points. In addition, SURF also optimizes the accuracy of feature points through interpolation operations, and uses Hessian matrix and interpolation algorithm to locate feature points in detail. The final selection of feature points depends on their stability, and unstable points are discarded, as shown in Eq. (7) [19].

(7)
$ x=-\frac{\partial ^{2} H^{-1} }{\partial x^{2} } \frac{\partial H}{\partial x} . $

The circular area around the feature point is analyzed, and the Harr wavelet response within $x$ and $y$ on the axis is calculated. These reactions are summarized using a Gaussian weighted sector window to determine the cardinal orientation of the feature points, as shown in Fig. 2.

Fig. 2. Main direction diagram of feature points.

../../Resources/ieie/IEIESPC.2025.14.2.268/image2.png

Next, for each feature point, a square area is selected around its principal direction, which is divided into 16 sub-areas. The Harr wavelet response is calculated for each region and is Gaussian-weighted. The resulting 4-dimensional vectors are assembled into 64-dimensional eigenvectors. These vectors are normalized to ensure rotation, illumination, and scale invariance, as shown in Fig. 3.

Fig. 3. SURF feature points describe the substructure diagram.

../../Resources/ieie/IEIESPC.2025.14.2.268/image3.png

The direct application of the SURF algorithm in image retrieval is limited by its high time complexity. It consists of two steps, feature detection and descriptor definition, to ensure scale and rotation invariance, respectively. The construction of the scale pyramid is a time-consuming step, and although the scale adaptability is improved, the neighborhood differences of feature points are still significant. In order to reduce this effect, this study fuses scale transformation and perceptual hash encoding to create a hashing algorithm that is resistant to rotation and scale transformation. SURF algorithm to locate the image feature points, as shown in Eq. (8).

(8)
$ P=\{ (x_{1} ,y_{1} ),~(x_{2} ,y_{2} ),~(x_{3} ,y_{3} ),~...,~(x_{k} ,y_{k} )\} . $

In Eq. (8), the total number of identified feature points is described as $k$. Subsequently, the K-means algorithm determines the center point in the set $P$, which is defined in Eq. (9).

(9)
$ (x_{z} ,y_{z} )=\frac{1}{k} \sum _{i=1}^{k}P_{i}. $

Next, the Euclidean distances from all elements in the set $P$ to the pixels $(x_{z} ,y_{z} )$ are calculated, and these distances are arranged in ascending order. The result of the sorting is shown in Eq. (10).

(10)
$ D=\{ d_{1} ,~d_{2} ,~d_{3} ,~...,~d_{k} \}. $

Next, $R$ is set to $10/k$ and a series of concentric circles are drawn with $(x_{z} ,y_{z} )$ as the center, the radius is $R/64$, $R/32$, ..., $R$ in turn. The number of feature points is calculated within each ring, as shown in Eq. (11).

(11)
$ N=\{ n_{1} ,~n_{2} ,~n_{3} ,~...,~n_{64} \} . $

In order to enhance the anti-rotation and anti-scale variation characteristics of the encoding, the image is scaled at a scale of 0.5 to 4. The transformed image is processed in steps (8) to (11) of equations to calculate the $N_{i} $ values and form a $K$ set as $\{ N_{1} $, $N_{2} $, $N_{3} $, ..., $N_{8} \} $. The $K$ is then hashed, the resulting encoding is $h_{2} $, and the final encoding is synthesized as $h=[\!\!\begin{array}{cc} {h_{1} } & {h_{2} } \end{array}\!\!]$, as shown in Eq. (12).

(12)
$\begin{align} h(i)=\left\{\begin{aligned} &{1},&&\text{if } n_{i} \ge \bar{N},~i=1,~2,~...,~64,\\ & &&\hskip 3.5pc N=\{ n_{1} ,~n_{2} ,~...,~n_{64} \},\\ &{0},&&\text{if } n_{i} <\bar{N},~i=1,~2,~...,~64,\\ & &&\hskip 3.5pc N=\{ n_{1} ,~n_{2} ,~...,~n_{64} \}. \end{aligned}\right.\end{align} $

3.3 Grayscale histogram and Siamese network feature extraction based on improved SURF algorithm

To improve the accuracy of the retrieval algorithm, the global features of the image were included in the coding results. Through the grayscale histogram, the vector representation of the image is generated, and the cosine similarity measures the similarity between the vectors. The grayscale histogram counts the frequency of each grayscale value in the image and reflects the global color distribution. This method, combined with the above-mentioned SURF base-perceptual hashing algorithm, can effectively improve the sensitivity to global features [20]. Firstly, the number of pixels in the grayscale image is counted, the histogram is divided into 64 regions to generate vectors, and finally the global grayscale feature similarity is compared by cosine distance, as shown in Eq. (13).

(13)
$\begin{align} \left\{\begin{aligned} & \cos (\theta )=\frac{\sum _{i=1}^{n}(x_{i} \times y_{i} ) }{\sqrt{\sum _{i=1}^{n}(x_{i} )^{2} } \times \sqrt{\sum _{i=1}^{n}(y_{i} )^{2} } }\\ &\hskip 2.4pc =\frac{a\cdot b}{\|a\|\times \|b\|},\\ & \theta =\arccos (\cos (\theta )). \end{aligned}\right.\end{align} $

In Eq. (13), $\theta $ represents the angle between vectors, and $a$ and $b$ represent the histogram vectors of the two images, respectively. Considering that the traditional SURF algorithm mainly recognizes the underlying features such as edge and brightness changes, it is not enough to reveal the high-level semantics. This limits the accuracy of the search results to reflect user intent. In order to make up for this shortcoming, a Siamese network model with improved SURF algorithm was proposed. The model transforms the input into the target space by a mapping function, where the Euclidean distance is used for similarity comparison. The training phase aims to minimize the loss between samples of the same class and maximize the loss between different classes. Convolutional neural networks process images through local feature abstraction, but the features are significantly different when rotated or translated at large angles. In order to enhance the rotation and translation invariance, a module is integrated in front of the Siamese network, which first extracts SURF features, then matches them by the nearest neighbor algorithm, and finally calculates the parameters of the correlated affine transformation model, as shown in Fig. 4.

Fig. 4. Anti-rotation and anti-translation conversion module.

../../Resources/ieie/IEIESPC.2025.14.2.268/image4.png

Due to the fixed image size caused by the fully connected layer, it is recommended to add a space pyramid pool (SPP) before the fully connected layer of the network. This enables the network to process input images of any size, enhancing their scale invariance. SPP can extract fixed-dimensional feature vectors from feature maps, for example, in a simple double-layer network, regardless of the size of the input image, 21-dimensional feature vectors can be extracted, as shown in Fig. 5.

Fig. 5. SPP diagram.

../../Resources/ieie/IEIESPC.2025.14.2.268/image5.png

Each image within the initial image and pyramid group is used as a match. Unlike the earlier Siamese network, this network adds a regularization term to the objective function to enhance scale invariance. In this way, images of the same scale group produce more consistent features. The parameters of the scale-invariant layer are denoted as $(W_{a} ,W_{b} )$, and its output is shown in Eq. (14).

(14)
$ O_{a} =\kappa (W_{a} O_{m} (x_{i} )+B_{a} ) . $

In Eq. (14), $\kappa $ is described as $\max (x,0)$; $O_{m} $ represents the input of the scale invariant layer; $B_{a} $ represents the regularization term. In this study, the Siamese network model based on the SURF algorithm is improved, and its invariance to rotation, scale and translation is enhanced. The network can process images of any size and consists of two branches that share parameters. The image is corrected by the rotation and translation module and processed by a convolutional neural network. The SPP layer extracts fixed features from the feature map and outputs them to the fully connected layer. Finally, the network outputs feature vectors with the aim of minimizing the loss of similar images and maintaining the consistency of the feature vectors in training, as shown in Fig. 6.

Fig. 6. SIAMESE network model structure diagram.

../../Resources/ieie/IEIESPC.2025.14.2.268/image6.png

4. Results and Discussion

In this research experiment, a self-built dataset was created, which was used as the basis for the analysis of the nearest neighbor ratio and robustness, as well as the evaluation of the algorithm's performance in resisting rotation and scale changes. Then, the Mnist dataset was introduced, and the self-built dataset was combined for image feature extraction and matching analysis.

4.1 Nearest neighbor ratio and robustness results based on SURF feature extraction matching

In order to explore the feature extraction of Mazu patterns in homestay space design based on SURF algorithm, this study created a database containing various pattern categories, such as architectural images and Mazu ceramic patterns. The images in the self built dataset were all relevant images collected by Sony A7C II, and the collection location was Quanzhou City, Fujian Province. These images are of similar size and are divided into 10 categories such as architecture, ceramics, packaging, sculpture, etc., totaling 1000 images of different categories. And the object in the diagram occupies the main position. Fig. 7 shows some of the images in the database. Ten categories were selected from it, and 10 images were randomly selected from each category. These images are rotated, noise-added, resized, reshaped, and logo-added, resulting in 25 test images per raw image. The experiment was conducted using MATLAB R2020a. In order to verify the performance of the proposed algorithm, a comparison was made between the image aware hashing algorithm combined with SIFT and PCA and the algorithm in reference [10]. Additionally, the algorithm proposed in the study was compared with the basic algorithm discrete cosine transform (DCT) algorithm in reference [10].

Fig. 7. Partial image database.

../../Resources/ieie/IEIESPC.2025.14.2.268/image7.png

This study focused on feature extraction and matching of Mazu patterns based on SURF algorithm. Experiments compared the feature extraction and retrieval effects using different nearest neighbor ratios in SURF feature matching, and compared the robustness of the improved SURF algorithm based on the perceptual hash coding model and the traditional SURF algorithm, as shown in Fig. 8.

Fig. 8. Different nearest neighbor ratios and SURF accuracy before and after improvement.

../../Resources/ieie/IEIESPC.2025.14.2.268/image8.png

As can be observed in Fig. 8(a), the robustness of the algorithm is highest when the nearest neighbor matching ratio is set between 0.4 and 0.6. Setting above this range preserves too many false matches, while values below this preclude many correct matches, all of which affect robustness. Fig. 8(b) shows that the original SURF algorithm outperforms the improved algorithm in terms of robustness. However, the average encoding time using the improved SURF algorithm is only 0.10 seconds, which is much lower than the 0.50 seconds of traditional algorithms. This is mainly due to the fact that the improved algorithm omits the scale pyramid construction step, which significantly reduces the time complexity. Next, the study compared the robustness and real-time performance of the algorithm with other existing algorithms, and the specific results are shown in Fig. 9.

Fig. 9. Comparison of four algorithms.

../../Resources/ieie/IEIESPC.2025.14.2.268/image9.png

As shown in Fig. 9(a), compared to other algorithms, the proposed image feature matching algorithm has higher precision. As for the Top 100, the precision of the proposed method is 0.98, which is higher than other methods. As shown in Fig. 9(b), the proposed image feature matching algorithm consistently has a higher plagiarism detection rate than other algorithms. Taking the Top 100 as an example, the plagiarism detection rate of this algorithm is 0.96, which is higher than other algorithms. As shown in Fig. 9(c), the accuracy recall curve of the proposed image feature matching algorithm completely envelops the accuracy recall curves of other algorithms, indicating that the performance of the proposed image feature matching algorithm is superior to other algorithms. To verify the algorithm performance in resisting rotation and scale changes, the transformed images were tested for feature extraction and matching retrieval on the self-built dataset to measure its accuracy. Finally, a single image was selected as the output of the detection, and Fig. 10 shows the results of related experiments.

Fig. 10. Various image feature recognition accuracy rate.

../../Resources/ieie/IEIESPC.2025.14.2.268/image10.png

In Fig. 10, the average retrieval accuracy of the proposed algorithm, the Method A, the Method B, and the Method C are 93.27%, 65.52%, 84.78%, and 67.72%, respectively. Results showed that the proposed algorithm was significantly better than other algorithms in resisting rotational change, and performed better in the case of image deformation. This was made possible by incorporating a grayscale histogram comparison between images into the final evaluation. It can be found that the proposed algorithm mainly strengthened the anti-rotation characteristics of hash coding, and was comparable to other algorithms in terms of scale invariance.

4.2 Siamese network based on SURF algorithm for image feature extraction and matching analysis

The datasets used in the study include a self built image set and an MNIST dataset, with the MNIST dataset sourced from the National Institute of Standards and Technology in the United States. The MNIST dataset contained 245${\sim}$250 numbers handwritten by individuals. During the training phase, for the sake of efficiency, anti-rotation modules that do not involve parameter adjustment were temporarily excluded. This module was mainly used for image correction. When training on a self-managed dataset, two images are randomly selected, with the same category marked as a match and different categories marked as a mismatch. The model was updated by stochastic gradient descent method, and the learning rate was tested. As the number of iterations increased, the loss function gradually decreased to close to 0, as shown in Fig. 11, which indicated that the model effectively reduced the distance between similar images and expanded the distance between different images.

Fig. 11. The learning rate training process under different iterations.

../../Resources/ieie/IEIESPC.2025.14.2.268/image11.png

In this study, the self-built image library contained 10 categories, with a total of 1000 images in different categories, which were used for image retrieval experiments. 10 images were randomly selected from each category, and the images were subjected to a variety of treatments, including keeping them in their original shape, applying rotation, increasing noise, adjusting size and brightness, changing shape, and adding a logo, resulting in a different test image for each original image. These processed image samples are shown in Fig. 12.

Fig. 12. Image feature transformation.

../../Resources/ieie/IEIESPC.2025.14.2.268/image12.png

For both datasets, 1,000 images were randomly selected. In addition, an additional 100 images were selected as the query set, and the images were rotated, translated, and scaled to facilitate feature extraction and matching, as shown in Fig. 13.

Fig. 13. Experimental results of self-built data set and Mnist data set.

../../Resources/ieie/IEIESPC.2025.14.2.268/image13.png

As can be observed in Fig. 13, the algorithm studied outperforms the others. Other algorithms attempted to improve the adaptability to rotation and scale through data augmentation, but this approach did not fundamentally improve model's performance in these aspects. The core operations of CNNs limited their ability to handle rotational and scale transformations, and the effect was limited. From Table 1, the average accuracy of the proposed algorithm was 93.53% and 93.91%, respectively, which was better than other algorithms. This was due to the advantages of SURF in processing the underlying image features.

Next, the average time required for image feature extraction and match retrieval was calculated, and the real-time performance of different algorithms was compared. Table 2 shows the results on the self-managed dataset and the Mnist dataset. In Table 2, the average processing time of the proposed algorithm was 8.12 s and 5.25 s, respectively, which was relatively long. This was mainly due to the fact that the algorithm needed to perform SURF feature point extraction and matching on the input image. In contrast, other algorithms had faster feature extraction due to the simple structure of the model. The DCT algorithm was the shortest among all algorithms, with an average time of only 0.013s and 0.011s, respectively, due to its simple calculation of the similarity between images through Hamming distance. To verify the impact of SPP on the scale robustness of the proposed algorithm, ablation experiments were conducted, and the results are shown in Table 3.

According to Table 3, the recognition accuracy of the algorithm with SPP module is much higher than that of the algorithm without SPP module. Taking the rotation operation as an example, the recognition accuracy with and without SPP is 95.43% and 78.53%, respectively. The above results indicate that SPP can effectively enhance the scale robustness of the algorithm.

Table 1. Recognition of various image transformations in two data sets.

Data set

Type

Ours

Method A

Method B

Method C

Build your own data set

Rotate

94.23%

68.22%

67.13%

60.12%

Noise

98.17%

98.12%

98.15%

62.02%

Scale Variation

93.12%

83.18%

87.10%

62.40%

Shape

83.47%

81.59%

80.12%

70.09%

Shape change

97.12%

97.10%

98.02%

60.11%

Add LOGO

95.04%

96.12%

97.14%

80.45%

MNIST data set

Rotate

94.23%

68.22%

68.30%

62.11%

Noise

98.22%

98.32%

98.43%

63.42%

Scale Variation

94.22%

84.28%

88.21%

64.42%

Shape change

83.78%

82.46%

84.32%

73.13%

Luminance

97.55%

97.62%

98.42%

67.23%

Add LOGO

95.43%

96.72%

97.82%

82.66%

Average

93.72%

87.66%

88.60%

67.35%

Table 2. Comparison of feature extraction and matching retrieval time of four algorithms in two data sets.

Data set

Ours

Method A

Method B

Method C

Build your own data set

8.12s

3.254s

5.213s

0.013s

MNIST data set

5.25s

1.363s

3.417s

0.011s

Table 3. Results of the ablation experiments.

Type

SPP exists

No SPP

Rotate

95.43%

78.53%

Noise

98.19%

82.12%

Scale Variation

93.22%

76.45%

Shape change

89.47%

72.81%

Luminance

98.12%

83.34%

Add LOGO

96.21%

86.71%

5. Conclusion

The main challenge of this research is how to effectively extract and match the element characteristics of the Mazu pattern in the design of the homestay space. In order to solve this challenge, an image-aware hash model is first constructed and the SURF algorithm is integrated into it. Subsequently, the improved SURF algorithm SIAMESE network model is further developed, focusing on improving model's performance in rotation and translation invariance. The results showed that the improved SURF algorithm performed best in terms of robustness when using a nearest neighbor matching ratio of 0.4 to 0.6, surpassing the traditional SURF algorithm. In the self-built dataset and the Mnist dataset, the improved algorithm achieved 93.53% and 93.91% image retrieval accuracy, respectively, showing higher efficiency than other algorithms. However, the algorithm was relatively long in terms of encoding time, averaging 8.12 seconds and 5.25 seconds, respectively. Despite this, its recognition accuracy when dealing with image distortion was significantly improved, especially in terms of resistance to rotation and scale changes. In summary, although the algorithm has made remarkable achievements in enhancing the invariance of rotation and scale transformation, the length of encoding time is still the main limitation of its application. Future research will focus on improving coding efficiency, with the aim of shortening the time of feature extraction and matching process and making it more suitable for rapid response scenarios. In addition, this study will explore the ability of the algorithm to parse the high-level semantic features of images, so as to improve its overall performance and application scope. Considering the diverse image data and application scenarios, future work will be extended to a wider range of datasets and application fields to enhance model's universality and practicability.

Funding

The research is supported by 2023 Putian City Science and Technology Plan Project: Application Strategy of Mazu Pattern Elements in Homestay Space Design - Taking Meizhou Island as an Example. (Project number: 2023SZ3001PTXY06).

REFERENCES

1 
R. Singh, A. Acharya, and S. Tiwari, ``Pose and illumination invariant hybrid feature extraction for Nnewborn,'' Recent Advances in Computer Science and Communications, vol. 14, no. 2, pp. 368-375, 2021.DOI
2 
F. M. El-Ghamry, W. El-Shafai, and M. I. Abdalla, ``Gauss gradient and SURF features for landmine detection from GPR images,'' Computers, Materials, and Continuum, vol. 71, no. 3, pp. 4457-4487, 2022.DOI
3 
A. Sukumaran and T. Brindha, ``Nature-inspired hybrid deep learning for race detection by face shape features,'' International Journal of Intelligent Computing and Cybernetics, vol. 13, no. 3, pp. 365-388, 2020.DOI
4 
G. Chetwynd, ``Shell pores over SURF bids for Gato do Mato,'' Upstream: The International Oil & Gas Newspaper, vol. 27, no. 40, pp. 18-19, 2022.URL
5 
B. Jindal and S. Garg, ``FIFE: Fast and indented feature extractor for medical imaging based on shape features,'' Multimedia Tools and Applications, vol. 82, no. 4, pp. 6053-6069, 2023.DOI
6 
S. A. Suandi and S. Setumin, ``Characterising local feature descriptors for face sketch to photo matching,'' International Journal of Computational Vision and Robotics, vol. 10, no. 6, pp. 522-544, 2020.DOI
7 
P. Preethi and H. R. Mamatha, ``Region-based convolutional neural network for segmenting text in epigraphical images,'' Artificial Intelligence and Applications, vol. 1, no. 2, pp. 119-127, 2023.DOI
8 
E. Cho and Y. Kim, ``Dynamic optimization of hessian determinant image pyramid for memory‐efficient and high performance keypoint detection in SURF,'' IET Image Processing, vol. 15, no. 13, pp. 3392-3399, 2021.DOI
9 
A. S. M. Yasin, M. M. Haque, M. N. Adnan, S. Rahnuma, and A. Hossain, ``Localization of autonomous robot in an urban area based on SURF feature extraction of images,'' International Journal of Technology Diffusion, vol. 11, no. 4, pp. 84-111, 2020.DOI
10 
S. A. Nawaz, J. Li, J. Liu, U. A. Bhatti, J. Zhou, and R. M. Ahmad, ``A feature-based hybrid medical image watermarking algorithm based on SURF-DCT,'' Proc. of International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, vol. 978, no. 3, pp. 1080-1090, 2020.DOI
11 
Q. An, X. Chen, and S. Wu, ``A novel fast image stitching method based on the combination of SURF and cell,'' Complexity, vol. 21, no. 2, pp. 1-14, 2021.DOI
12 
R. Radha and M. Pushpa, ``A comparative analysis of SIFT, SURF and ORB on sketch and paint based images,'' International Journal of Forensic Engineering, vol. 5, no. 2, pp. 102-110, 2021.DOI
13 
H. Xu and H. Cao, ``A static gesture recognition method based on improved SURF algorithm and Bayesian regularization BP neural network,'' Taiwan Academic Network Management Committee, vol. 22, no. 3, pp. 707-714, 2021.DOI
14 
Y. Yohannes, S. Devella, and A. H. Pandrean, ``Penerapan speeded-up robust feature pada random forest funtuk llasifikasi motif songket palembang,'' Jurnal Teknik Informatika dan Sistem Informasi, vol. 5, no. 3, pp. 360-369, 2020.DOI
15 
M. Bansal, M. Kumar, M. Kumar, and K. Kumar, ``An efficient technique for object recognition using Shi-Tomasi corner detection algorithm. Soft computing: A fusion of foundations,'' Methodologies and Applications, vol. 25, no. 6, pp. 4423-4432, 2021.DOI
16 
L. Wang, B. Hao, J. Huang, Z. Liu , and C. Liu, ``The 3D reconstruction of ROI based on the improved feature fusion and matching strategy,'' Journal of Nonlinear and Convex Analysis, vol. 22, no. 10, pp. 2041-2051, 2021.URL
17 
D. Rondao, N. Aouf, M. A. Richardson, and O. Duboismatra, ``Benchmarking of local feature detectors and descriptors for multispectral relative navigation in space,'' Acta Astronautica, vol. 172, no. 7, pp. 100-122, 2020.DOI
18 
L. Zhang, K. Li, Y. Qi, and F. Wang, ``Local feature extracted by the improved bag of features method for person re-identification,'' Neurocomputing, vol. 458, no. 11, pp. 690-700, 2021.DOI
19 
X. Liu, ``Research on intelligent visual image feature region acquisition algorithm in Internet of Things framework,'' Computer Communications, vol. 151, no. 2, pp. 299-305, 2020.DOI
20 
Y. Song, L. Su, and X. Wang, ``Abnormal noise recognition of door closing for passenger car based on image processing,'' Recent Patents on Mechanical Engineering, vol. 14, no. 4, pp. 505-514, 2021.DOI

Author

Mingfeng Yan
../../Resources/ieie/IEIESPC.2025.14.2.268/author1.png

Mingfeng Yan obtained a master's degree in Visual Communication Design from Xiamen University in 2011. Currently, he is engaged in teaching and research work in the Department of Environmental Design, School of Arts and Crafts, Putian University. He has been invited to serve as a consultant and has delivered various technical speeches on issues such as the principles of environmental design, the fundamentals of landscape design, and interior design. He has published articles in more than 10 well-known peer-reviewed journals and conference proceedings both at home and abroad. His research areas include environmental design, landscape design, interior design, and more.