PyTorch implementation of paper "IBRNet: Learning Multi-View Image-Based Rendering", CVPR 2021.

Related tags

Deep LearningIBRNet
Overview

IBRNet: Learning Multi-View Image-Based Rendering

PyTorch implementation of paper "IBRNet: Learning Multi-View Image-Based Rendering", CVPR 2021.

IBRNet: Learning Multi-View Image-Based Rendering
Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, Thomas Funkhouser
CVPR 2021

project page | paper | data & model

Demo

Installation

Clone this repo with submodules:

git clone --recurse-submodules https://github.com/googleinterns/IBRNet
cd IBRNet/

The code is tested with Python3.7, PyTorch == 1.5 and CUDA == 10.2. We recommend you to use anaconda to make sure that all dependencies are in place. To create an anaconda environment:

conda env create -f environment.yml
conda activate ibrnet

Datasets

1. Training datasets

├──data/
    ├──ibrnet_collected_1/
    ├──ibrnet_collected_2/
    ├──real_iconic_noface/
    ├──spaces_dataset/
    ├──RealEstate10K-subset/
    ├──google_scanned_objects/

Please first cd data/, and then download datasets into data/ following the instructions below. The organization of the datasets should be the same as above.

(a) Our captures

We captured 67 forward-facing scenes (each scene contains 20-60 images). To download our data ibrnet_collected.zip (4.1G) for training, run:

gdown https://drive.google.com/uc?id=1rkzl3ecL3H0Xxf5WTyc2Swv30RIyr1R_
unzip ibrnet_collected.zip

P.S. We've captured some more scenes in ibrnet_collected_more.zip, but we didn't include them for training. Feel free to download them if you would like more scenes for your task, but you wouldn't need them to reproduce our results.

(b) LLFF released scenes

Download and process real_iconic_noface.zip (6.6G) using the following commands:

# download 
gdown https://drive.google.com/uc?id=1ThgjloNt58ZdnEuiCeRf9tATJ-HI0b01
unzip real_iconic_noface.zip

# [IMPORTANT] remove scenes that appear in the test set
cd real_iconic_noface/
rm -rf data2_fernvlsb data2_hugetrike data2_trexsanta data3_orchid data5_leafscene data5_lotr data5_redflower
cd ../

(c) Spaces Dataset

Download spaces dataset by:

git clone https://github.com/augmentedperception/spaces_dataset

(d) RealEstate10K

The full RealEstate10K dataset is very large and can be difficult to download. Hence, we provide a subset of RealEstate10K training scenes containing only 200 scenes. In our experiment, we found using more scenes from RealEstate10K only provides marginal improvement. To download our camera files (2MB):

gdown https://drive.google.com/uc?id=1IgJIeCPPZ8UZ529rN8dw9ihNi1E9K0hL
unzip RealEstate10K_train_cameras_200.zip -d RealEstate10K-subset

Besides the camera files, you also need to download the corresponding video frames from YouTube. You can download the frames (29G) by running the following commands. The script uses ffmpeg to extract frames, so please make sure you have ffmpeg installed.

git clone https://github.com/qianqianwang68/RealEstate10K_Downloader
cd RealEstate10K_Downloader
python generate_dataset.py train
cd ../

(e) Google Scanned Objects

Google Scanned Objects contain 1032 diffuse objects with various shapes and appearances. We use gaps to render these objects for training. Each object is rendered at 512 × 512 pixels from viewpoints on a quarter of the sphere. We render 250 views for each object. To download our renderings (7.5GB), run:

gdown https://drive.google.com/uc?id=1w1Cs0yztH6kE3JIz7mdggvPGCwIKkVi2
unzip google_scanned_objects_renderings.zip

2. Evaluation datasets

├──data/
    ├──deepvoxels/
    ├──nerf_synthetic/
    ├──nerf_llff_data/

The evaluation datasets include DeepVoxel synthetic dataset, NeRF realistic 360 dataset and the real forward-facing dataset. To download all three datasets (6.7G), run the following command under data/ directory:

bash download_eval_data.sh

Evaluation

First download our pretrained model under the project root directory:

gdown https://drive.google.com/uc?id=165Et85R8YnL-5NcehG0fzqsnAUN8uxUJ
unzip pretrained_model.zip

You can use eval/eval.py to evaluate the pretrained model. For example, to obtain the PSNR, SSIM and LPIPS on the fern scene in the real forward-facing dataset, you can first specify your paths in configs/eval_llff.txt and then run:

cd eval/
python eval.py --config ../configs/eval_llff.txt

Rendering videos of smooth camera paths

You can use render_llff_video.py to render videos of smooth camera paths for the real forward-facing scenes. For example, you can first specify your paths in configs/eval_llff.txt and then run:

cd eval/
python render_llff_video.py --config ../configs/eval_llff.txt

You can also capture your own data of forward-facing scenes and synthesize novel views using our method. Please follow the instructions from LLFF on how to capture and process the images.

Training

We strongly recommend you to train the model with multiple GPUs:

# this example uses 8 GPUs (nproc_per_node=8) 
python -m torch.distributed.launch --nproc_per_node=8 train.py --config configs/pretrain.txt

Alternatively, you can train with a single GPU by setting distributed=False in configs/pretrain.txt and running:

python train.py --config configs/pretrain.txt

Finetuning

To finetune on a specific scene, for example, fern, using the pretrained model, run:

# this example uses 2 GPUs (nproc_per_node=2) 
python -m torch.distributed.launch --nproc_per_node=2 train.py --config configs/finetune_llff.txt

Additional information

  • Our current implementation is not well-optimized in terms of the time efficiency at inference. Rendering a 1000x800 image can take from 30s to over a minute depending on specific GPU models. Please make sure to maximize the GPU memory utilization by increasing the size of the chunk to reduce inference time. You can also try to decrease the number of input source views (but subject to performance loss).
  • If you want to create and train on your own datasets, you can implement your own Dataset class following our examples in ibrnet/data_loaders/. You can verify the camera poses using data_verifier.py in ibrnet/data_loaders/.
  • Since the evaluation datasets are either object-centric or forward-facing scenes, our provided view selection methods are very simple (based on either viewpoints or camera locations). If you want to evaluate our method on new scenes with other kinds of camera distributions, you might need to implement your own view selection methods to identify the most effective source views.
  • If you have any questions, you can contact [email protected].

Citation

@inproceedings{wang2021ibrnet,
  author    = {Wang, Qianqian and Wang, Zhicheng and Genova, Kyle and Srinivasan, Pratul and Zhou, Howard  and Barron, Jonathan T. and Martin-Brualla, Ricardo and Snavely, Noah and Funkhouser, Thomas},
  title     = {IBRNet: Learning Multi-View Image-Based Rendering},
  booktitle = {CVPR},
  year      = {2021}
}

Owner
Google Interns
Google Interns
A universal memory dumper using Frida

Fridump Fridump (v0.1) is an open source memory dumping tool, primarily aimed to penetration testers and developers. Fridump is using the Frida framew

551 Jan 07, 2023
The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

STAR-FC This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" 🌟 🌟 . 🎓 Re

Shuai Shen 87 Dec 28, 2022
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Diverse Image Captioning with Context-Object Split Latent Spaces This repository is the PyTorch implementation of the paper: Diverse Image Captioning

Visual Inference Lab @TU Darmstadt 34 Nov 21, 2022
Self-Supervised Learning

Self-Supervised Learning Features self_supervised offers features like modular framework support for multi-gpu training using PyTorch Lightning easy t

Robin 1 Dec 14, 2021
Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it

Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.

mani 1.2k Jan 07, 2023
Stream images from a connected camera over MQTT, view using Streamlit, record to file and sqlite

mqtt-camera-streamer Summary: Publish frames from a connected camera or MJPEG/RTSP stream to an MQTT topic, and view the feed in a browser on another

Robin Cole 183 Dec 16, 2022
DeepVoxels is an object-specific, persistent 3D feature embedding.

DeepVoxels is an object-specific, persistent 3D feature embedding. It is found by globally optimizing over all available 2D observations of

Vincent Sitzmann 196 Dec 25, 2022
Code repo for "Transformer on a Diet" paper

Transformer on a Diet Reference: C Wang, Z Ye, A Zhang, Z Zhang, A Smola. "Transformer on a Diet". arXiv preprint arXiv (2020). Installation pip insta

cgraywang 31 Sep 26, 2021
TensorFlow implementation of the algorithm in the paper "Decoupled Low-light Image Enhancement"

Decoupled Low-light Image Enhancement Shijie Hao1,2*, Xu Han1,2, Yanrong Guo1,2 & Meng Wang1,2 1Key Laboratory of Knowledge Engineering with Big Data

17 Apr 25, 2022
Human Pose estimation with TensorFlow framework

Human Pose Estimation with TensorFlow Here you can find the implementation of the Human Body Pose Estimation algorithm, presented in the DeeperCut and

Eldar Insafutdinov 1.1k Dec 29, 2022
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Website | ArXiv | Get Start | Video PIRenderer The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic

Ren Yurui 261 Jan 09, 2023
Python Multi-Agent Reinforcement Learning framework

- Please pay attention to the version of SC2 you are using for your experiments. - Performance is *not* always comparable between versions. - The re

whirl 1.3k Jan 05, 2023
TyXe: Pyro-based BNNs for Pytorch users

TyXe: Pyro-based BNNs for Pytorch users TyXe aims to simplify the process of turning Pytorch neural networks into Bayesian neural networks by leveragi

87 Jan 03, 2023
Development of IP code based on VIPs and AADM

Sparse Implicit Processes In this repository we include the two different versions of the SIP code developed for the article Sparse Implicit Processes

1 Aug 22, 2022
Introduction to CPM

CPM CPM is an open-source program on large-scale pre-trained models, which is conducted by Beijing Academy of Artificial Intelligence and Tsinghua Uni

Tsinghua AI 136 Dec 23, 2022
Deep Learning for Time Series Classification

Deep Learning for Time Series Classification This is the companion repository for our paper titled "Deep learning for time series classification: a re

Hassan ISMAIL FAWAZ 1.2k Jan 02, 2023
Pytorch implementation of MalConv

MalConv-Pytorch A Pytorch implementation of MalConv Desciprtion This is the implementation of MalConv proposed in Malware Detection by Eating a Whole

Alexander H. Liu 58 Oct 26, 2022
Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Big Vision This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and

Google Research 701 Jan 03, 2023
4th place solution for the SIGIR 2021 challenge.

SIGIR-2021 (Tinkoff.AI) How to start Download train and test data: https://sigir-ecom.github.io/data-task.html Place it under sigir-2021/data/. Run py

Tinkoff.AI 4 Jul 01, 2022
Fine-Tune EleutherAI GPT-Neo to Generate Netflix Movie Descriptions in Only 47 Lines of Code Using Hugginface And DeepSpeed

GPT-Neo-2.7B Fine-Tuning Example Using HuggingFace & DeepSpeed Installation cd venv/bin ./pip install -r ../../requirements.txt ./pip install deepspe

Nikita 180 Jan 05, 2023