Title |
Efficient Prompt Fusion for RGB-D Semantic Segmentation |
DOI |
https://doi.org/10.5573/ieie.2025.62.7.56 |
Keywords |
Multimodality; RGB-D; Segmentation; Prompt learning |
Abstract |
RGB-D semantic segmentation is a research field that addresses scene understanding challenges that are difficult to solve using only RGB information by incorporating depth data. This study applies prompt learning techniques to RGB-D semantic segmentation, enhancing performance by adding a minimal number of parameters while maintaining the original model structure. In particular, the post-fusion prompt method is a simple yet effective approach that minimizes information loss and maximizes interaction between the two modalities. The superiority of the post-fusion approach over the pre-fusion method was experimentally validated on the NYUv2 and SUN RGB-D datasets. In the case of the NYUv2 dataset, our method outperformed MultiMAE (Multimodal Multitask Masked Autoencoders), a representative multimodal learning approach, by approximately 2.2% in mIoU. These findings suggest new possibilities for prompt learning in the fusion process of RGB and depth information. |