Mobile QR Code
Title STCM: A Novel Operation Strategy for Minimizing Loss from Single Image Depth Estimation
Authors 이창엽(Chang Yeop Lee) ; 권현섭(Hyun Seob Kwon) ; 김동주(Dong Ju Kim) ; 한순신(Sun Sin Han) ; 서영주(Young Joo Suh) ; 황도경(Do Kyung Hwang)
DOI https://doi.org/10.5573/ieie.2024.61.4.49
Page pp.49-59
ISSN 2287-5026
Keywords Monocular depth estimation; Dense prediction; Group convolution; Deep learning; Computer vision
Abstract This paper presents a method for estimating depth from a single image using monocular camera-based depth estimation analysis. Our proposed method is a variation of the Split-Transform-Merge (STM) strategy, widely employed in the existing deep learning image classification domain, adapted universally to the Monocular Depth Estimation (MDE) domain. The primary aim of our technique is to enhance depth estimation performance. Traditionally, there were two tasks in this domain: regression, which involves inputting and outputting a single image into the network, and classification, which predefined discretized depth value classes (Bins). The classification task aimed to obtain the Bins probability value for each pixel as the network output. However, a paradigm shift has shifted from predefined Bins to adaptive Bins technology. This technology allows learning by adding a Bins estimation module to the final network output. Recent research has focused on various types of Bins estimation modules to achieve more accurate adaptive Bins estimation. One limitation arises when high-dimensional feature map representations are input to the Bins estimation module. These representations are projected and input as low-dimensional feature map representations, inevitably leading to information loss. We propose a novel Split-Transform-Conversion-Merge (STCM) strategy to address these limitations. This strategy delves deep into the relationship between feature map representations and Bins estimation modules, thereby improving the boundaries of information loss. To validate the performance of our proposed strategy, we applied it to various MDE techniques, observing significant improvements in performance across all evaluation metrics. Additionally, we noted enhanced visual performance in our experiments.