DirectVoxGO reconstructs a scene representation from a set of calibrated images capturing the scene.

Overview

DirectVoxGO

DirectVoxGO (Direct Voxel Grid Optimization, see our paper) reconstructs a scene representation from a set of calibrated images capturing the scene.

  • NeRF-comparable quality for synthesizing novel views from our scene representation.
  • Super-fast convergence: Our 15 mins/scene vs. NeRF's 10~20+ hrs/scene.
  • No cross-scene pre-training required: We optimize each scene from scratch.
  • Better rendering speed: Our <1 secs vs. NeRF's 29 secs to synthesize a 800x800 images.

Below run-times (mm:ss) of our optimization progress are measured on a machine with a single RTX 2080 Ti GPU.

github_teaser.mp4

Update

  • 2021.11.23: Support CO3D dataset.
  • 2021.11.23: Initial release. Issue page is disabled for now. Feel free to contact [email protected] if you have any questions.

Installation

git clone [email protected]:sunset1995/DirectVoxGO.git
cd DirectVoxGO
pip install -r requirements.txt

Pytorch installation is machine dependent, please install the correct version for your machine. The tested version is pytorch 1.8.1 with python 3.7.4.

Dependencies (click to expand)
  • PyTorch, numpy: main computation.
  • scipy, lpips: SSIM and LPIPS evaluation.
  • tqdm: progress bar.
  • mmcv: config system.
  • opencv-python: image processing.
  • imageio, imageio-ffmpeg: images and videos I/O.

Download: datasets, trained models, and rendered test views

Directory structure for the datasets (click to expand; only list used files)
data
├── nerf_synthetic     # Link: https://drive.google.com/drive/folders/128yBriW1IG_3NJ5Rp7APSTZsJqdJdfc1
│   └── [chair|drums|ficus|hotdog|lego|materials|mic|ship]
│       ├── [train|val|test]
│       │   └── r_*.png
│       └── transforms_[train|val|test].json
│
├── Synthetic_NSVF     # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/Synthetic_NSVF.zip
│   └── [Bike|Lifestyle|Palace|Robot|Spaceship|Steamtrain|Toad|Wineholder]
│       ├── intrinsics.txt
│       ├── rgb
│       │   └── [0_train|1_val|2_test]_*.png
│       └── pose
│           └── [0_train|1_val|2_test]_*.txt
│
├── BlendedMVS         # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/BlendedMVS.zip
│   └── [Character|Fountain|Jade|Statues]
│       ├── intrinsics.txt
│       ├── rgb
│       │   └── [0|1|2]_*.png
│       └── pose
│           └── [0|1|2]_*.txt
│
├── TanksAndTemple     # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/TanksAndTemple.zip
│   └── [Barn|Caterpillar|Family|Ignatius|Truck]
│       ├── intrinsics.txt
│       ├── rgb
│       │   └── [0|1|2]_*.png
│       └── pose
│           └── [0|1|2]_*.txt
│
├── deepvoxels     # Link: https://drive.google.com/drive/folders/1ScsRlnzy9Bd_n-xw83SP-0t548v63mPH
│   └── [train|validation|test]
│       └── [armchair|cube|greek|vase]
│           ├── intrinsics.txt
│           ├── rgb/*.png
│           └── pose/*.txt
│
└── co3d               # Link: https://github.com/facebookresearch/co3d
    └── [donut|teddybear|umbrella|...]
        ├── frame_annotations.jgz
        ├── set_lists.json
        └── [129_14950_29917|189_20376_35616|...]
            ├── images
            │   └── frame*.jpg
            └── masks
                └── frame*.png

Synthetic-NeRF, Synthetic-NSVF, BlendedMVS, Tanks&Temples, DeepVoxels datasets

We use the datasets organized by NeRF, NSVF, and DeepVoxels. Download links:

Download all our trained models and rendered test views at this link to our logs.

CO3D dataset

We also support the recent Common Objects In 3D dataset. Our method only performs per-scene reconstruction and no cross-scene generalization.

GO

Train

To train lego scene and evaluate testset PSNR at the end of training, run:

$ python run.py --config configs/nerf/lego.py --render_test

Use --i_print and --i_weights to change the log interval.

Evaluation

To only evaluate the testset PSNR, SSIM, and LPIPS of the trained lego without re-training, run:

$ python run.py --config configs/nerf/lego.py --render_only --render_test \
                                              --eval_ssim --eval_lpips_vgg

Use --eval_lpips_alex to evaluate LPIPS with pre-trained Alex net instead of VGG net.

Reproduction

All config files to reproduce our results:

$ ls configs/*
configs/blendedmvs:
Character.py  Fountain.py  Jade.py  Statues.py

configs/nerf:
chair.py  drums.py  ficus.py  hotdog.py  lego.py  materials.py  mic.py  ship.py

configs/nsvf:
Bike.py  Lifestyle.py  Palace.py  Robot.py  Spaceship.py  Steamtrain.py  Toad.py  Wineholder.py

configs/tankstemple:
Barn.py  Caterpillar.py  Family.py  Ignatius.py  Truck.py

configs/deepvoxels:
armchair.py  cube.py  greek.py  vase.py

Your own config files

Check the comments in configs/default.py for the configuable settings. The default values reproduce our main setup reported in our paper. We use mmcv's config system. To create a new config, please inherit configs/default.py first and then update the fields you want. Below is an example from configs/blendedmvs/Character.py:

_base_ = '../default.py'

expname = 'dvgo_Character'
basedir = './logs/blended_mvs'

data = dict(
    datadir='./data/BlendedMVS/Character/',
    dataset_type='blendedmvs',
    inverse_y=True,
    white_bkgd=True,
)

Development and tuning guide

Extention to new dataset

Adjusting the data related config fields to fit your camera coordinate system is recommend before implementing a new one. We provide two visualization tools for debugging.

  1. Inspect the camera and the allocated BBox.
    • Export via --export_bbox_and_cams_only {filename}.npz:
      python run.py --config configs/nerf/mic.py --export_bbox_and_cams_only cam_mic.npz
    • Visualize the result:
      python tools/vis_train.py cam_mic.npz
  2. Inspect the learned geometry after coarse optimization.
    • Export via --export_coarse_only {filename}.npz (assumed coarse_last.tar available in the train log):
      python run.py --config configs/nerf/mic.py --export_coarse_only coarse_mic.npz
    • Visualize the result:
      python tools/vis_volume.py coarse_mic.npz 0.001 --cam cam_mic.npz
Inspecting the cameras & BBox Inspecting the learned coarse volume

Speed and quality tradeoff

We have reported some ablation experiments in our paper supplementary material. Setting N_iters, N_rand, num_voxels, rgbnet_depth, rgbnet_width to larger values or setting stepsize to smaller values typically leads to better quality but need more computation. Only stepsize is tunable in testing phase, while all the other fields should remain the same as training.

Acknowledgement

The code base is origined from an awesome nerf-pytorch implementation, but it becomes very different from the code base now.

Owner
sunset
A Ph.D. candidate working on computer vision tasks. Recently focusing on 3D modeling.
sunset
Code for "Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space"

Sparse Steerable Convolution (SS-Conv) Code for "Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and

25 Dec 21, 2022
A diff tool for language models

LMdiff Qualitative comparison of large language models. Demo & Paper: http://lmdiff.net LMdiff is a MIT-IBM Watson AI Lab collaboration between: Hendr

Hendrik Strobelt 27 Dec 29, 2022
Discord bot for notifying on github events

Git-Observer Discord bot for notifying on github events ⚠️ This bot is meant to write messages to only one channel (implementing this for multiple pro

ilu_vatar_ 0 Apr 19, 2022
Python implementation of O-OFDMNet, a deep learning-based optical OFDM system,

O-OFDMNet This includes Python implementation of O-OFDMNet, a deep learning-based optical OFDM system, which uses neural networks for signal processin

Thien Luong 4 Sep 09, 2022
LBK 26 Dec 28, 2022
Learning View Priors for Single-view 3D Reconstruction (CVPR 2019)

Learning View Priors for Single-view 3D Reconstruction (CVPR 2019) This is code for a paper Learning View Priors for Single-view 3D Reconstruction by

Hiroharu Kato 38 Aug 17, 2022
SymPy-powered, Wolfram|Alpha-like answer engine totally in your browser, without backend computation

SymPy Beta SymPy Beta is a fork of SymPy Gamma. The purpose of this project is to run a SymPy-powered, Wolfram|Alpha-like answer engine totally in you

Liumeo 25 Dec 21, 2022
The final project of "Applying AI to 3D Medical Imaging Data" from "AI for Healthcare" nanodegree - Udacity.

Quantifying Hippocampus Volume for Alzheimer's Progression Background Alzheimer's disease (AD) is a progressive neurodegenerative disorder that result

Omar Laham 1 Jan 14, 2022
[CVPR 2021] Forecasting the panoptic segmentation of future video frames

Panoptic Segmentation Forecasting Colin Graber, Grace Tsai, Michael Firman, Gabriel Brostow, Alexander Schwing - CVPR 2021 [Link to paper] We propose

Niantic Labs 44 Nov 29, 2022
Save-restricted-v-3 - Save restricted content Bot For telegram

Save restricted content Bot Contact: Telegram A stable telegram bot to get restr

DEVANSH 11 Dec 21, 2022
This is the latest version of the PULP SDK

PULP-SDK This is the latest version of the PULP SDK, which is under active development. The previous (now legacy) version, which is no longer supporte

78 Dec 07, 2022
code for generating data set ES-ImageNet with corresponding training code

es-imagenet-master code for generating data set ES-ImageNet with corresponding training code dataset generator some codes of ODG algorithm The variabl

Ordinarabbit 18 Dec 25, 2022
Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions

Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions Accepted by AAAI 2022 [arxiv] Wenyu Liu, Gaofeng Ren, Runsheng Yu, Shi Guo, Jia

liuwenyu 245 Dec 16, 2022
Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Mingrui Yu 3 Jan 07, 2022
NR-GAN: Noise Robust Generative Adversarial Networks

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

Takuhiro Kaneko 59 Dec 11, 2022
Really awesome semantic segmentation

really-awesome-semantic-segmentation A list of all papers on Semantic Segmentation and the datasets they use. This site is maintained by Holger Caesar

Holger Caesar 400 Nov 28, 2022
This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

inverse_attention This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021. Le

Firas Laakom 5 Jul 08, 2022
PantheonRL is a package for training and testing multi-agent reinforcement learning environments.

PantheonRL is a package for training and testing multi-agent reinforcement learning environments. PantheonRL supports cross-play, fine-tuning, ad-hoc coordination, and more.

Stanford Intelligent and Interactive Autonomous Systems Group 57 Dec 28, 2022
Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

CoGAIL Table of Content Overview Installation Dataset Training Evaluation Trained Checkpoints Acknowledgement Citations License Overview This reposito

Jeremy Wang 29 Dec 24, 2022
Toward Multimodal Image-to-Image Translation

BicycleGAN Project Page | Paper | Video Pytorch implementation for multimodal image-to-image translation. For example, given the same night image, our

Jun-Yan Zhu 1.4k Dec 22, 2022