Mobile QR Code
Title Robust Self-supervised Multi-frame Depth Estimation: In Search of Cost Volume Alternatives
Authors 김진현(Jinhyeon Kim) ; 김규동(Gyudong Kim) ; 나혁주(Hyukju Na) ; 장현성(Hyunsung Jang) ; 박재민(Jaemin Park) ; 황재기(Jaegi Hwang) ; 하남구(Namkoo Ha) ; 김영근(Young Geun Kim) ; 김승룡(Seungryong Kim)
DOI https://doi.org/10.5573/ieie.2024.61.12.97
Page pp.97-100
ISSN 2287-5026
Keywords Deep learning; Self-supervised learning; Transformer; Cost volume; Depth estimation
Abstract Self-supervised multi-frame depth estimation predicts depth by utilizing geometric cues from multiple input frames. Traditional methods rely on epipolar geometry to construct cost volumes, but they have two major drawbacks: (1) they assume a static environment and (2) they require pose information during inference. Consequently, these methods struggle in real-world scenarios with dynamic objects. In this paper, we propose using the cross-attention map as a comprehensive cost volume to address these limitations. We show that training the cross-attention layers for image reconstruction enables implicit learning of a warping function, similar to the explicit epipolar warping in conventional methods. We introduce CRoss-Attention map and Feature aggregaTor (CRAFT), designed to effectively aggregate and refine the full cost volume. We also implement CRAFT hierarchically, enhancing depth predictions through a coarse-to-fine approach. Evaluations on the Cityscapes datasets demonstrate that our method outperforms traditional techniques, showing robustness in challenging conditions with dynamic objects.