Code release for the paper “Worldsheet Wrapping the World in a 3D Sheet for View Synthesis from a Single Image”, ICCV 2021.

Overview

Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image

This repository contains the code for the following paper:

  • R. Hu, N. Ravi, A. Berg, D. Pathak, Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image. in ICCV, 2021. (PDF)
@inproceedings{hu2021worldsheet,
  title={Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image},
  author={Hu, Ronghang and Ravi, Nikhila and Berg, Alex and Pathak, Deepak},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
  year={2021}
}

Project Page: https://worldsheet.github.io/

Installation

Our Worldsheet implementation is based on MMF and PyTorch3D. This repository is adapted from the MMF repository (https://github.com/facebookresearch/mmf).

This code is designed to be run on GPU, CPU training/inference is not supported.

  1. Create a new conda environment:
conda create -n worldsheet python=3.8
conda activate worldsheet
  1. Download this repository or clone with Git, and then enter the root directory of the repository git clone https://github.com/facebookresearch/worldsheet.git && cd worldsheet

  2. Install the MMF dependencies: pip install -r requirements.txt

  3. Install PyTorch3D as follows (we used v0.2.5):

# Install using conda
conda install -c pytorch3d pytorch3d=0.2.5

# Or install from GitHub directly
git clone https://github.com/facebookresearch/pytorch3d.git && cd pytorch3d
git checkout v0.2.5
rm -rf build/ **/*.so
FORCE_CUDA=1 pip install -e .
cd ..

# or pip install from github 
pip install "git+https://github.com/facebookresearch/[email protected]"
  1. Extra dependencies
pip install scikit-image

Train and evaluate on the Matterport3D and Replica datasets

In this work, we use the same Matterport3D and Replica datasets as in SynSin, based on the Habitat environment. In our codebase and config files, these two datasets are referred to as synsin_habitat (synsin_mp3d and synsin_replica) (note that here the synsin_ prefix only refers to the datasets used in SynSin; the underlying model being trained and evaluated is our Worldsheet model, not SynSin).

Extract the image frames

In our project, we extract those training, validation, and test image frames and camera matrices using the SynSin codebase for direct comparisons with SynSin and other previous work.

Please install our modified SynSin codebase from synsin_for_data_and_eval branch of this repository to extract the Matterport3D and Replica image frames:

git clone https://github.com/facebookresearch/worldsheet.git -b synsin_for_data_and_eval synsin && cd synsin

and install habitat-sim and habitat-api as additional SynSin dependencies following the official SynSin installation instructions. For convenience, we provide the corresponding versions of habitat-sim and habitat-api for SynSin in habitat-sim-for-synsin and habitat-sim-for-synsin branches of this repository.

After installing the SynSin codebase from synsin_for_data_and_eval branch, set up Matterport3D and Replica datasets following the instructions in the SynSin codebase, and run the following to save the image frames to disk (you can change MP3D_SAVE_IMAGE_DIR to a location on your machine).

# this is where Matterport3D and Replica image frames will be extracted
export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

# clone the SynSin repo from `synsin_for_data_and_eval` branch
git clone https://github.com/facebookresearch/worldsheet.git -b synsin_for_data_and_eval ../synsin
cd ../synsin

# Matterport3D train
DEBUG="" python evaluation/dump_train_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/train \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10  --images_before_reset 1000
# Matterport3D val
DEBUG="" python evaluation/dump_val_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/val \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10  --images_before_reset 200
# Matterport3D test
DEBUG="" python evaluation/dump_test_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/test \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10  --images_before_reset 200
# Replica test
DEBUG="" python evaluation/dump_test_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_replica/test \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10  --images_before_reset 200 \
     --dataset replica
# Matterport3D val with 20-degree angle change
DEBUG="" python evaluation/dump_val_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/val_jitter_angle20 \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10 --images_before_reset 200 \
     --render_ids 0 --jitter_quaternions_angle 20
# Matterport3D test with 20-degree angle change
DEBUG="" python evaluation/dump_test_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/test_jitter_angle20 \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10 --images_before_reset 200 \
     --render_ids 0 --jitter_quaternions_angle 20

cd ../worldsheet  # assuming `synsin` repo and `worldsheet` repo are under the same parent directory

Training

Run the following to perform training and evaluation. In our experiments, we use a single machine with 4 NVIDIA V100-32GB GPUs.

# set to the same path as in image frame extraction above
export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

# train the scene mesh prediction in Worldsheet
./run_mp3d_and_replica/train_mp3d.sh mp3d_nodepth_perceptual_l1laplacian

# train the inpainter with frozen scene mesh prediction
./run_mp3d_and_replica/train_mp3d.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh

Pretrained models

Instead of performing the training above, one can also directly download the pretrained models via

./run_mp3d_and_replica/download_pretrained_models.sh

and run the evaluation below.

Evaluation

The evaluation scripts below will print the performance (PSNR, SSIM, Perc-Sim) on different test data.

Evaluate on the default test sets with the same camera changes as the training data (Table 1):

# set to the same path as in image frame extraction above
export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

# Matterport3D, without inpainter (Table 1 line 6)
./run_mp3d_and_replica/eval_mp3d_test_iter.sh mp3d_nodepth_perceptual_l1laplacian 40000

# Matterport3D, full model (Table 1 line 7)
./run_mp3d_and_replica/eval_mp3d_test_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

# Replica, full model (Table 1 line 7)
./run_mp3d_and_replica/eval_replica_test_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

Evaluate the full model on 2X camera changes (Table 2):

# set to the same path as in image frame extraction above
export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

# Matterport3D, without inpainter (Table 2 line 4)
./run_mp3d_and_replica/eval_mp3d_test_jitter_angle20_iter.sh mp3d_nodepth_perceptual_l1laplacian 40000

# Matterport3D, full model (Table 2 line 5)
./run_mp3d_and_replica/eval_mp3d_test_jitter_angle20_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

# Replica, full model (Table 2 line 5)
./run_mp3d_and_replica/eval_replica_test_jitter_angle20_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

Visualization

One can visualize the model predictions using the script run_mp3d_and_replica/visualize_mp3d_val_iter.sh to visualize the Matterport3D validation set (and this script can be modified to visualize other splits). For example, run the following to visualize the predictions from the full model:

export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

./run_mp3d_and_replica/visualize_mp3d_val_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

Then, you can inspect the predictions using the notebook run_mp3d_and_replica/visualize_predictions.ipynb.

Train and evaluate on the RealEstate10K dataset

In this work, we use the same RealEstate10K dataset as in SynSin.

Setting up the RealEstate10K dataset

Please set up the dataset following the instructions in SynSin. The scripts below assumes this dataset has been downloaded to /checkpoint/ronghanghu/neural_rendering_datasets/realestate10K/RealEstate10K/frames/. You can modify its path in mmf/configs/datasets/synsin_realestate10k/defaults.yaml.

Training

Run the following to perform the training and evaluation. In our experiments, we use a single machine with 4 NVIDIA V100-32GB GPUs.

# train 33x33 mesh
./run_realestate10k/train.sh realestate10k_dscale2_lowerL1_200

# initialize 65x65 mesh from trained 33x33 mesh
python ./run_realestate10k/init_65x65_from_33x33.py \
    --input ./save/synsin_realestate10k/realestate10k_dscale2_lowerL1_200/models/model_50000.ckpt \
    --output ./save/synsin_realestate10k/realestate10k_dscale2_stride4ft_lowerL1_200/init.ckpt

# train 65x65 mesh
./run_realestate10k/train.sh realestate10k_dscale2_stride4ft_lowerL1_200

Pretrained models

Instead of performing the training above, one can also directly download the pretrained models via

./run_realestate10k/download_pretrained_models.sh

and run the evaluation below.

Evaluation

Note: as mentioned in the paper, following the evaluation protocol of SynSin on RealEstate10K, the best metrics of two separate predictions based on each view were reported for single-view methods. We follow this evaluation protocol for consistency with SynSin on RealEstate10K in Table 3. We also report averaged metrics over all predictions in the supplemental.

The script below evaluates the performance on RealEstate10K with averaged metrics over all predictions, as reported in the supplemental Table C.1:

# Evaluate 33x33 mesh (Supplemental Table C.1 line 6)
./run_realestate10k/eval_test_iter.sh realestate10k_dscale2_lowerL1_200 50000

# Evaluate 65x65 mesh (Supplemental Table C.1 line 7)
./run_realestate10k/eval_test_iter.sh realestate10k_dscale2_stride4ft_lowerL1_200 50000

To evaluate with the SynSin protocol using the best metrics of two separate predictions as in Table 3, one needs to first save the predicted novel views as PNG files, and then use the SynSin codebase for evaluation. Please install our modified SynSin codebase from synsin_for_data_and_eval branch of this repository following the Matterport3D and Replica instructions above. Then evaluate as follows:

# Save prediction PNGs for 33x33 mesh
./run_realestate10k/write_pred_pngs_test_iter.sh realestate10k_dscale2_lowerL1_200 50000

# Save prediction PNGs for 65x65 mesh
./run_realestate10k/write_pred_pngs_test_iter.sh realestate10k_dscale2_stride4ft_lowerL1_200 50000

cd ../synsin  # assuming `synsin` repo and `worldsheet` repo under the same directory

# Evaluate 33x33 mesh (Table 3 line 9)
python evaluation/evaluate_realestate10k_all.py \
    --take_every_other \
    --folder ../worldsheet/save/prediction_synsin_realestate10k/realestate10k_dscale2_lowerL1_200/50000/realestate10k_test

# Evaluate 65x65 mesh (Table 3 line 10)
python evaluation/evaluate_realestate10k_all.py \
    --take_every_other \
    --folder ../worldsheet/save/prediction_synsin_realestate10k/realestate10k_dscale2_stride4ft_lowerL1_200/50000/realestate10k_test

(The --take_every_other flag above performs best-of-two-prediction evaluation; without this flag, it should give the average-over-all-prediction results as in Supplemental Table C.1.)

Visualization

One can visualize the model's predictions using the script run_realestate10k/eval_val_iter.sh for the RealEstate10K validation set (run_realestate10k/visualize_test_iter.sh for the test set). For example, run the following to visualize the predictions from the 65x65 mesh:

./run_realestate10k/visualize_val_iter.sh realestate10k_dscale2_stride4ft_lowerL1_200 50000

Then, you can inspect the predictions using notebook run_realestate10k/visualize_predictions.ipynb.

We also provide a notebook for interactive predictions in run_realestate10k/make_interactive_videos.ipynb, where one can walk through the scene and generate a continuous video of the predicted novel views.

Wrapping sheets with external depth prediction

In Sec. 4.3 of the paper, we test the limits of wrapping a mesh sheet over a large variety of images. We provide a notebook for this analysis in external_depth/make_interactive_videos_with_midas_depth.ipynb, where one can interactively generate a continuous video of the predicted novel views.

The structure of Worldsheet codebase

Worldsheet is implemented as a MMF model. This codebase largely follows the structure of typical MMF models and datasets.

The Worldsheet model is defined under the MMF model name mesh_renderer in the following files:

  • model definition: mmf/models/mesh_renderer.py
  • mesh and rendering utilities, losses, and metrics: mmf/neural_rendering/
  • config base: mmf/configs/models/mesh_renderer/defaults.yaml

The experimental config files for the Matterport and Replica experiments are in the following files:

  • Habitat dataset definition: mmf/datasets/builders/synsin_habitat/
  • Habitat dataset config base: mmf/configs/datasets/synsin_habitat/defaults.yaml
  • experimental configs: projects/neural_rendering/configs/synsin_habitat/

The experimental config files for the RealEstate10K experiments are in the following files:

  • RealEstate10K dataset definition: mmf/datasets/builders/synsin_realestate10k/
  • RealEstate10K dataset config base: mmf/configs/datasets/synsin_realestate10k/defaults.yaml
  • experimental configs: projects/neural_rendering/configs/synsin_realestate10k/

Acknowledgements

This repository is modified from the MMF library from Facebook AI Research. A large part of the codebase has been modified from the pix2pixHD codebase. Our PSNR, SSIM, and Perc-Sim evaluation scripts are modified from the SynSin codebase and we also use SynSin for image frame extraction on Matterport3D and Replica. A part of our differentiable rendering implementation is built upon the Softmax Splatting codebase. All appropriate licenses are included in the files in which the code is used.

Licence

Worldsheet is released under the BSD License.

Painting app using Python machine learning and vision technology.

AI Painting App We are making an app that will track our hand and helps us to draw from that. We will be using the advance knowledge of Machine Learni

Badsha Laskar 3 Oct 03, 2022
A novel benchmark dataset for Monocular Layout prediction

AutoLay AutoLay: Benchmarking Monocular Layout Estimation Kaustubh Mani, N. Sai Shankar, J. Krishna Murthy, and K. Madhava Krishna Abstract In this pa

Kaustubh Mani 39 Apr 26, 2022
[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

MuVER This repo contains the code and pre-trained model for our EMNLP 2021 paper: MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity

24 May 30, 2022
Generating synthetic mobility data for a realistic population with RNNs to improve utility and privacy

lbs-data Motivation Location data is collected from the public by private firms via mobile devices. Can this data also be used to serve the public goo

Alex 11 Sep 22, 2022
Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

PLOP: Learning without Forgetting for Continual Semantic Segmentation This repository contains all of our code. It is a modified version of Cermelli e

Arthur Douillard 116 Dec 14, 2022
ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021

ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021 Dataset Code Demos Authors: He Zhang, Yuting Ye, Tak

HE ZHANG 194 Dec 06, 2022
Progressive Coordinate Transforms for Monocular 3D Object Detection

Progressive Coordinate Transforms for Monocular 3D Object Detection This repository is the official implementation of PCT. Introduction In this paper,

58 Nov 06, 2022
Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Streaming speaker diarization Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation by Juan Manuel Coria, Hervé

Juanma Coria 187 Jan 06, 2023
Fastquant - Backtest and optimize your trading strategies with only 3 lines of code!

fastquant 🤓 Bringing backtesting to the mainstream fastquant allows you to easily backtest investment strategies with as few as 3 lines of python cod

Lorenzo Ampil 1k Dec 29, 2022
Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

Edge Impulse 8 Nov 02, 2022
Experiments with Fourier layers on simulation data.

Factorized Fourier Neural Operators This repository contains the code to reproduce the results in our NeurIPS 2021 ML4PS workshop paper, Factorized Fo

Alasdair Tran 57 Dec 25, 2022
Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)

Automated Learning Rate Scheduler for Large-Batch Training The official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th

Kakao Brain 35 Jan 04, 2023
Educational API for 3D Vision using pose to control carton.

Educational API for 3D Vision using pose to control carton.

41 Jul 10, 2022
ML model to classify between cats and dogs

Cats-and-dogs-classifier This is my first ML model which can classify between cats and dogs. Here the accuracy is around 75%, however , the accuracy c

Sharath V 4 Aug 20, 2021
Anomaly detection analysis and labeling tool, specifically for multiple time series (one time series per category)

taganomaly Anomaly detection labeling tool, specifically for multiple time series (one time series per category). Taganomaly is a tool for creating la

Microsoft 272 Dec 17, 2022
Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

Yihong Sun 12 Nov 15, 2022
Pure python implementation reverse-mode automatic differentiation

MiniGrad A minimal implementation of reverse-mode automatic differentiation (a.k.a. autograd / backpropagation) in pure Python. Inspired by Andrej Kar

Kenny Song 76 Sep 12, 2022
[CVPR 2021] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans Introduction We introduce the task of dense captioning in 3D scans from commodity RGB-D sensor

Dave Z. Chen 79 Nov 07, 2022
A Python library for working with arbitrary-dimension hypercomplex numbers following the Cayley-Dickson construction of algebras.

Hypercomplex A Python library for working with quaternions, octonions, sedenions, and beyond following the Cayley-Dickson construction of hypercomplex

7 Nov 04, 2022
Weakly Supervised Segmentation with Tensorflow. Implements instance segmentation as described in Simple Does It: Weakly Supervised Instance and Semantic Segmentation, by Khoreva et al. (CVPR 2017).

Weakly Supervised Segmentation with TensorFlow This repo contains a TensorFlow implementation of weakly supervised instance segmentation as described

Phil Ferriere 220 Dec 13, 2022