DOMVS:nUnsupervised Multi-View Stereo for Dealing With Occlusion Scenes

Title:DOMVS:nUnsupervised Multi-View Stereo for Dealing With Occlusion Scenes

Authors:Mingwei Cao, Qiuju Wang, Ning Li and Haifeng Zhao

Conference:CGI 2025

Tags:Depth Estimation, Multi-view Stereo, Perceptual Consistency, Structured Occlusion and Unsupervised Learning

Abstract:

Deep learning-based multi-view stereo (MVS) methods have made significant progress in recent years. Due to limited access to large-scale annotated datasets, researchers explore unsupervised MVS methods that do not require ground-truth depth data. However, unsupervised MVS methods struggle with occluded or texture-less regions, as they rely on the assumption of photometric consistency. To address these issues, we propose an unsupervised MVS method named Unsupervised Multi-View Stereo for Dealing With Occlusion Scenes (DOMVS). We first propose a feature-level perceptual consistency module that minimizes reconstruction errors by comparing the differences in high-level semantic features between images. Meanwhile, we propose a structured occlusion generation module. It improves the accuracy and completeness of depth estimation by generating augmented samples for contrastive learning. Moreover, we propose the DLA-Net module for normalization to address the limitations of the receptive field. It enhances the accuracy of the depth map through the aggregation of global information. We test DOMVS on the DTU and Tanks &Temples datasets. Results demonstrate that DOMVS achieves an overall score of 0.339 on the DTU dataset, higher than the state-of-the-art method RC-MVSNet. On the Tanks & Temples dataset, DOMVS outperforms ADR-MVSNet and JDACS-MS by 4.31% and 20.34%, respectively.