A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

Last update: Nov 17, 2022

Related tags

Deep Learning idn-solver

Overview

idn-solver

Paper | Project Page

This repository contains the code release of our ICCV 2021 paper:

A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

Wang Zhao*, Shaohui Liu*, Yi Wei, Hengkai Guo, Yong-Jin Liu

Installation

We recommend to use conda to setup a specified environment. Run

conda env create -f environment.yml

Test on a sequence

First download the pretrained model from here and put it under ./pretrain/ folder.

Prepare the sequence data with color images, camera poses (4x4 cam2world transformation) and intrinsics. The sequence data structure should be like:

sequence_name
  | color
      | 00000.jpg
  | pose
      | 00000.txt
  | K.txt

Run the following command to get the outputs:

python infer_folder.py --seq_dir /path/to/the/sequence/data --output_dir /path/to/save/outputs --config ./configs/test_folder.yaml

Tune the "reference gap" parameter to make sure there are sufficient overlaps and camera translations within an image pair. For ScanNet-like sequence, we recommend to use reference_gap of 20.

Test on ScanNet

Prepare ScanNet test split data

Download the ScanNet test split data from the official site and pre-process the data using:

python ./data/preprocess.py --data_dir /path/to/scannet/test/split/ --output_dir /path/to/save/pre-processed/scannet/test/data

This includes 1. resize the color images to 480x640 resolution 2. sample the data with interval of 20

Run evaluation

python eval_scannet.py --data_dir /path/to/processed/scannet/test/split/ --config ./configs/test_scannet.yaml

Train

Prepare ScanNet training data

We use the pre-processed ScanNet data from NAS, you could download the data using this link. The data structure is like:

scannet
  | scannet_nas
    | train
      | scene0000_00
          | color
            | 0000.jpg
          | pose
            | 0000.txt
          | depth
            | 0000.npy
          | intrinsic
          | normal
            | 0000_normal.npy
    | val
  | scans_test_sample (preprocessed ScanNet test split)

Run training

Modify the "dataset_path" variable with yours in the config yaml.

The network is trained with a two-stage strategy. The whole training process takes ~6 days with 4 Nvidia V100 GPUs.

python train.py ./configs/scannet_stage1.yaml
python train.py ./configs/scannet_stage2.yaml

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Zhao_2021_ICCV,
    author    = {Zhao, Wang and Liu, Shaohui and Wei, Yi and Guo, Hengkai and Liu, Yong-Jin},
    title     = {A Confidence-Based Iterative Solver of Depths and Surface Normals for Deep Multi-View Stereo},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {6168-6177}
}

Acknowledgement

This project heavily relies codes from NAS and we thank the authors for releasing their code.

We also thank Xiaoxiao Long for kindly helping with ScanNet evaluations.

A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

Related tags

Overview

idn-solver

Installation

Test on a sequence

Test on ScanNet

Prepare ScanNet test split data

Run evaluation

Train

Prepare ScanNet training data

Run training

Citation

Acknowledgement

Owner

zhaowang

Contrastive Loss Gradient Attack (CLGA)

This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

Tensorflow 2 implementations of the C-SimCLR and C-BYOL self-supervised visual representation methods from "Compressive Visual Representations" (NeurIPS 2021)

Fewshot-face-translation-GAN - Generative adversarial networks integrating modules from FUNIT and SPADE for face-swapping.

VOGUE: Try-On by StyleGAN Interpolation Optimization

Fastquant - Backtest and optimize your trading strategies with only 3 lines of code!

Rethinking Portrait Matting with Privacy Preserving

Pytorch implementation of XRD spectral identification from COD database

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Good Semi-Supervised Learning That Requires a Bad GAN

End-to-End Referring Video Object Segmentation with Multimodal Transformers

The Official PyTorch Implementation of "LSGM: Score-based Generative Modeling in Latent Space" (NeurIPS 2021)

Annealed Flow Transport Monte Carlo

A PyTorch Implementation of "Neural Arithmetic Logic Units"

Parallel Latent Tree-Induction for Faster Sequence Encoding

[ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph Generation

This repository contains Prior-RObust Bayesian Optimization (PROBO) as introduced in our paper "Accounting for Gaussian Process Imprecision in Bayesian Optimization"

Implementation for the paper: Invertible Denoising Network: A Light Solution for Real Noise Removal (CVPR2021).

Unsupervised Image Generation with Infinite Generative Adversarial Networks

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).