Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Title Cross-modal Graphic Retrieval Optimization Method Based on Deep Learning and Hash Learning
Authors (Lu Tan)
DOI https://doi.org/10.5573/IEIESPC.2025.14.4.471
Page pp.471-482
ISSN 2287-5255
Keywords Graph neural networks; Deep learning; Hash algorithms; Graph retrieval; Multimodality
Abstract This work proposes a novel approach for cross-modal graphic retrieval, leveraging deep learning and hash learning techniques. It aims to address the limitations of current multimodal information retrieval methods in capturing detailed information within individual modalities. Initially, a deep learning-based model is developed to extract features from text and image modalities. To further enhance the granularity of modality-specific information, a cross-modal hashing retrieval model incorporating graphic features is proposed. This model leverages attention mechanisms and adversarial networks to optimize performance. Experimental results demonstrate the effectiveness of the proposed model, achieving an average recall of 77.8% in graphic feature extraction with the highest classification precision of 0.637 and 0.712 on two separate datasets. Furthermore, the cross-modal hash retrieval model achieves an impressive average precision mean value of 0.833 in the image retrieval text task using a 64-bit hash code. These findings indicate that the proposed model surpasses comparable models in terms of precision-recall curve. The attentional mechanism, intermodal confrontation, and intra-modal confrontation modules significantly contribute to the model’s performance in image and text detection. Notably, the attentional mechanism module plays the most significant role, followed by the intermodal confrontation module. Consequently, this study’s model is well-suited for cross-modal graphic retrieval tasks.