IEIE - Journal of the Institute of Electronics and Information Engineers

Mobile QR Code

Main Menu

Journal Search


Title	Robust Self-supervised Multi-frame Depth Estimation: In Search of Cost Volume Alternatives
Authors	김진현(Jinhyeon Kim) ; 김규동(Gyudong Kim) ; 나혁주(Hyukju Na) ; 장현성(Hyunsung Jang) ; 박재민(Jaemin Park) ; 황재기(Jaegi Hwang) ; 하남구(Namkoo Ha) ; 김영근(Young Geun Kim) ; 김승룡(Seungryong Kim)
DOI	https://doi.org/10.5573/ieie.2024.61.12.97
Page	pp.97-100
ISSN	2287-5026
Keywords	Deep learning; Self-supervised learning; Transformer; Cost volume; Depth estimation
Abstract	Self-supervised multi-frame depth estimation predicts depth by utilizing geometric cues from multiple input frames. Traditional methods rely on epipolar geometry to construct cost volumes, but they have two major drawbacks: (1) they assume a static environment and (2) they require pose information during inference. Consequently, these methods struggle in real-world scenarios with dynamic objects. In this paper, we propose using the cross-attention map as a comprehensive cost volume to address these limitations. We show that training the cross-attention layers for image reconstruction enables implicit learning of a warping function, similar to the explicit epipolar warping in conventional methods. We introduce CRoss-Attention map and Feature aggregaTor (CRAFT), designed to effectively aggregate and refine the full cost volume. We also implement CRAFT hierarchically, enhancing depth predictions through a coarse-to-fine approach. Evaluations on the Cityscapes datasets demonstrate that our method outperforms traditional techniques, showing robustness in challenging conditions with dynamic objects.

Copyright © IEIE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.