MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Last update: Jan 04, 2023

Related tags

Overview

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Project Page | Paper

If you find our work useful for your research, please consider citing our paper:

@article{DBLP:journals/corr/abs-2104-13325,
  author    = {Zhenpei Yang and
               Zhile Ren and
               Qi Shan and
               Qixing Huang},
  title     = {{MVS2D:} Efficient Multi-view Stereo via Attention-Driven 2D Convolutions},
  journal   = {CoRR},
  volume    = {abs/2104.13325},
  year      = {2021},
  url       = {https://arxiv.org/abs/2104.13325},
  eprinttype = {arXiv},
  eprint    = {2104.13325},
  timestamp = {Tue, 04 May 2021 15:12:43 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-13325.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

✏️ Changelog

Nov 27 2021

Initial release. Note that our released code achieve improved results than those reported in the initial arxiv pre-print. In addition, we include the evaluation on DTU dataset. We will update our paper soon.

⚙️ Installation

Click to expand

The code is tested with CUDA10.1. Please use following commands to install dependencies:

conda create --name mvs2d python=3.7
conda activate mvs2d

pip install -r requirements.txt

The folder structure should looks like the following if you have downloaded all data and pretrained models. Download links are inside each dataset tab at the end of this README.

.
├── configs
├── datasets
├── demo
├── networks
├── scripts
├── pretrained_model
│   ├── demon
│   ├── dtu
│   └── scannet
├── data
│   ├── DeMoN
│   ├── DTU_hr
│   ├── SampleSet
│   ├── ScanNet
│   └── ScanNet_3_frame_jitter_pose.npy
├── splits
│   ├── DeMoN_samples_test_2_frame.npy
│   ├── DeMoN_samples_train_2_frame.npy
│   ├── ScanNet_3_frame_test.npy
│   ├── ScanNet_3_frame_train.npy
│   └── ScanNet_3_frame_val.npy

🎬 Demo

Click to expand

After downloading the pretrained models for ScanNet, try to run following command to make a prediction on a sample data.

python demo.py --cfg configs/scannet/release.conf

The results are saved as demo.png

⏳ Training & Testing

We use 4 Nvidia V100 GPU for training. You may need to modify 'CUDA_VISIBLE_DEVICES' and batch size to accomodate your GPU resources.

ScanNet

Click to expand

Download

data 🔗 split 🔗 pretrained models 🔗 noisy pose 🔗

Training

First download and extract ScanNet training data and split. Then run following command to train our model.

bash scripts/scannet/train.sh

To train the multi-scale attention model, add --robust 1 to the training command in scripts/scannet/train.sh.

To train our model with noisy input pose, add --perturb_pose 1 to the training command in scripts/scannet/train.sh.

Testing

First download and extract data, split and pretrained models.

Then run:

bash scripts/scannet/test.sh

You should get something like these:

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.059	0.016	0.026	0.157	0.084	0.964	0.995	0.999	0.108	0.079	0.856	0.974	0.996

SUN3D/RGBD/Scenes11

Click to expand

Download

data 🔗 split 🔗 pretrained models 🔗

Training

First download and extract DeMoN training data and split. Then run following command to train our model.

bash scripts/demon/train.sh

Testing

First download and extract data, split and pretrained models.

Then run:

bash scripts/demon/test.sh

You should get something like these:

dataset rgbd: 160

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.082	0.165	0.047	0.440	0.147	0.921	0.939	0.948	0.325	0.284	0.753	0.894	0.933

dataset scenes11: 256

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.046	0.080	0.018	0.439	0.107	0.976	0.989	0.993	0.155	0.058	0.822	0.945	0.979

dataset sun3d: 160

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.099	0.055	0.044	0.304	0.137	0.893	0.970	0.993	0.224	0.171	0.649	0.890	0.969

-> Done!

depth

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.071	0.096	0.033	0.402	0.127	0.938	0.970	0.981	0.222	0.152	0.755	0.915	0.963

DTU

Click to expand

Download

data 🔗 eval data 🔗 pretrained models 🔗

Training

First download and extract DTU training data. Then run following command to train our model.

bash scripts/dtu/test.sh

Testing

First download and extract DTU eval data and pretrained models.

The following command performs three steps together: 1. Generate depth prediction on DTU test set. 2. Fuse depth predictions into final point cloud. 3. Evaluate predicted point cloud. Note that we re-implement the original Matlab Evaluation of DTU dataset using python.

bash scripts/dtu/test.sh

You should get something like these:

Acc 0.4051747996189477
Comp 0.2776021161518006
F-score 0.34138845788537414

Acknowledgement

The fusion code for DTU dataset is heavily built upon from PatchMatchNet

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Related tags

Overview

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Project Page | Paper

✏️ Changelog

Nov 27 2021

⚙️ Installation

🎬 Demo

⏳ Training & Testing

ScanNet

Download

Training

Testing

SUN3D/RGBD/Scenes11

Download

Training

Testing

DTU

Download

Training

Testing

Acknowledgement

Owner

Python script that allows you to automatically setup your Growtopia server.

Scripts and outputs related to the paper Prediction of Adverse Biological Effects of Chemicals Using Knowledge Graph Embeddings.

Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

Pretraining on Dynamic Graph Neural Networks

On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))

A simple rest api serving a deep learning model that classifies human gender based on their faces. (vgg16 transfare learning)

Spatial Transformer Nets in TensorFlow/ TensorLayer

Born-Infeld (BI) for AI: Energy-Conserving Descent (ECD) for Optimization

Evaluation framework for testing segmentation networks in PyTorch

The Body Part Regression (BPR) model translates the anatomy in a radiologic volume into a machine-interpretable form.

Optimizing Deeper Transformers on Small Datasets

PyTorch implementation of 'Gen-LaneNet: a generalized and scalable approach for 3D lane detection'

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation (CVPR 2021)

Activity tragle - Google is tracking everything, we just look at it

Official repository of "BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment"

Implémentation en pyhton de l'article Depixelizing pixel art de Johannes Kopf et Dani Lischinski

FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

PyTorch implementation for STIN

Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

🎓Automatically Update CV Papers Daily using Github Actions (Update at 12:00 UTC Every Day)