Mobile QR Code
Title Efficient Prompt Fusion for RGB-D Semantic Segmentation
Authors 편집부(Editor)
DOI https://doi.org/10.5573/ieie.2025.62.7.56
Page pp.56-62
ISSN 2287-5026
Keywords Multimodality; RGB-D; Segmentation; Prompt learning
Abstract RGB-D semantic segmentation is a research field that addresses scene understanding challenges that are difficult to solve using only RGB information by incorporating depth data. This study applies prompt learning techniques to RGB-D semantic segmentation, enhancing performance by adding a minimal number of parameters while maintaining the original model structure. In particular, the post-fusion prompt method is a simple yet effective approach that minimizes information loss and maximizes interaction between the two modalities. The superiority of the post-fusion approach over the pre-fusion method was experimentally validated on the NYUv2 and SUN RGB-D datasets. In the case of the NYUv2 dataset, our method outperformed MultiMAE (Multimodal Multitask Masked Autoencoders), a representative multimodal learning approach, by approximately 2.2% in mIoU. These findings suggest new possibilities for prompt learning in the fusion process of RGB and depth information.