Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

Overview

Neural Scene Flow Fields

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

[Project Website] [Paper] [Video]

Dependency

The code is tested with Python3, Pytorch >= 1.6 and CUDA >= 10.2, the dependencies includes

  • configargparse
  • matplotlib
  • opencv
  • scikit-image
  • scipy
  • cupy
  • imageio.
  • tqdm
  • kornia

Video preprocessing

  1. Download nerf_data.zip from link, an example input video with SfM camera poses and intrinsics estimated from COLMAP (Note you need to use COLMAP "colmap image_undistorter" command to undistort input images to get "dense" folder as shown in the example, this dense folder should include "images" and "sparse" folders).

  2. Download single view depth prediction model "model.pt" from link, and put it on the folder "nsff_scripts".

  3. Run the following commands to generate required inputs for training/inference:

    # Usage
    cd nsff_scripts
    # create camera intrinsics/extrinsic format for NSFF, same as original NeRF where it uses imgs2poses.py script from the LLFF code: https://github.com/Fyusion/LLFF/blob/master/imgs2poses.py
    python save_poses_nerf.py --data_path "/home/xxx/Neural-Scene-Flow-Fields/kid-running/dense/"
    # Resize input images and run single view model, 
    # argument resize_height: resized image height for model training, width will be resized based on original aspect ratio
    python run_midas.py --data_path "/home/xxx/Neural-Scene-Flow-Fields/kid-running/dense/" --resize_height 288
    # Run optical flow model
    ./download_models.sh
    python run_flows_video.py --model models/raft-things.pth --data_path /home/xxx/Neural-Scene-Flow-Fields/kid-running/dense/ 

Rendering from an example pretrained model

  1. Download pretraind model "kid-running_ndc_5f_sv_of_sm_unify3_F00-30.zip" from link. Unzipping and putting it in the folder "nsff_exp/logs/kid-running_ndc_5f_sv_of_sm_unify3_F00-30/360000.tar".

Set datadir in config/config_kid-running.txt to the root directory of input video. Then go to directory "nsff_exp":

   cd nsff_exp
   mkdir logs
  1. Rendering of fixed time, viewpoint interpolation
   python run_nerf.py --config configs/config_kid-running.txt --render_bt --target_idx 10

By running the example command, you should get the following result: Alt Text

  1. Rendering of fixed viewpoint, time interpolation
   python run_nerf.py --config configs/config_kid-running.txt --render_lockcam_slowmo --target_idx 8

By running the example command, you should get the following result: Alt Text

  1. Rendering of space-time interpolation
   python run_nerf.py --config configs/config_kid-running.txt --render_slowmo_bt  --target_idx 10

By running the example command, you should get the following result: Alt Text

Training

  1. In configs/config_kid-running.txt, modifying expname to any name you like (different from the original one), and running the following command to train the model:
    python run_nerf.py --config configs/config_kid-running.txt

The per-scene training takes ~2 days using 4 Nvidia GTX2080TI GPUs.

  1. Several parameters in config files you might need to know for training a good model on in-the-wild video
  • final_height: this must be same as --resize_height argument in run_midas.py, in kid-running case, it should be 288.
  • N_samples: in order to render images with higher resolution, you have to increase number sampled points such as 256 or 512
  • chain_sf: model will perform local 5 frame consistency if set True, and perform 3 frame consistency if set False. For faster training, setting to False.
  • start_frame, end_frame: indicate training frame range. The default model usually works for video of 1~2s and 30-60 frames work the best for default hyperparameters. Training on longer frames can cause oversmooth rendering. To mitigate the effect, you can increase the capacity of the network by increasing netwidth to 512.
  • decay_iteration: number of iteartion in initialization stage. Data-driven losses will decay every 1000 * decay_iteration steps. We have updated code to automatically calculate number of decay iterations.
  • no_ndc: our current implementation only supports reconstruction in NDC space, meaning it only works for forward-facing scene, same as original NeRF.
  • use_motion_mask, num_extra_sample: whether to use estimated coarse motion segmentation mask to perform hard-mining sampling during initialization stage, and how many extra samples during initialization stage.
  • w_depth, w_optical_flow: weight of losses for single-view depth and geometry consistency priors described in the paper. Weights of (0.4, 0.2) or (0.2, 0.1) usually work the best for most of the videos.
  • If you see signifacnt ghosting result in the final rendering, you might try the suggestion from link

Evaluation on the Dynamic Scene Dataset

  1. Download Dynamic Scene dataset "dynamic_scene_data_full.zip" from link

  2. Download pretrained model "dynamic_scene_pretrained_models.zip" from link, unzip and put them in the folder "nsff_exp/logs/"

  3. Run the following command for each scene to get quantitative results reported in the paper:

   # Usage: configs/config_xxx.txt indicates each scene name such as config_balloon1-2.txt in nsff/configs
   python evaluation.py --config configs/config_xxx.txt
  • Note: you have to use modified LPIPS implementation included in this branch in order to measure LIPIS error for dynamic region only as described in the paper.

Acknowledgment

The code is based on implementation of several prior work:

License

This repository is released under the MIT license.

Citation

If you find our code/models useful, please consider citing our paper:

@InProceedings{li2020neural,
  title={Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes},
  author={Li, Zhengqi and Niklaus, Simon and Snavely, Noah and Wang, Oliver},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}
Owner
Zhengqi Li
CS Ph.D. student at Cornell Tech, Cornell University
Zhengqi Li
VideoGPT: Video Generation using VQ-VAE and Transformers

VideoGPT: Video Generation using VQ-VAE and Transformers [Paper][Website][Colab][Gradio Demo] We present VideoGPT: a conceptually simple architecture

Wilson Yan 470 Dec 30, 2022
An open framework for Federated Learning.

Welcome to Intel® Open Federated Learning Federated learning is a distributed machine learning approach that enables organizations to collaborate on m

Intel Corporation 397 Dec 27, 2022
GANTheftAuto is a fork of the Nvidia's GameGAN

Description GANTheftAuto is a fork of the Nvidia's GameGAN, which is research focused on emulating dynamic game environments. The early research done

Harrison 801 Dec 27, 2022
Code to train models from "Paraphrastic Representations at Scale".

Paraphrastic Representations at Scale Code to train models from "Paraphrastic Representations at Scale". The code is written in Python 3.7 and require

John Wieting 71 Dec 19, 2022
An end-to-end regression problem of predicting the price of properties in Bangalore.

Bangalore-House-Price-Prediction An end-to-end regression problem of predicting the price of properties in Bangalore. Deployed in Heroku using Flask.

Shruti Balan 1 Nov 25, 2022
Simple renderer for use with MuJoCo (>=2.1.2) Python Bindings.

Viewer for MuJoCo in Python Interactive renderer to use with the official Python bindings for MuJoCo. Starting with version 2.1.2, MuJoCo comes with n

Rohan P. Singh 62 Dec 30, 2022
RLMeta is a light-weight flexible framework for Distributed Reinforcement Learning Research.

RLMeta rlmeta - a flexible lightweight research framework for Distributed Reinforcement Learning based on PyTorch and moolib Installation To build fro

Meta Research 281 Dec 22, 2022
Doods2 - API for detecting objects in images and video streams using Tensorflow

DOODS2 - Return of DOODS Dedicated Open Object Detection Service - Yes, it's a b

Zach 101 Jan 04, 2023
The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

ISC21-Descriptor-Track-1st The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track. You can check our solution

lyakaap 73 Dec 24, 2022
Keras Model Implementation Walkthrough

Keras Model Implementation Walkthrough

Luke Wood 17 Sep 27, 2022
PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)

Unsupervised Depth Completion with Calibrated Backprojection Layers PyTorch implementation of Unsupervised Depth Completion with Calibrated Backprojec

80 Dec 13, 2022
Controlling the MicriSpotAI robot from scratch

Abstract: The SpotMicroAI project is designed to be a low cost, easily built quadruped robot. The design is roughly based off of Boston Dynamics quadr

Florian Wilk 405 Jan 05, 2023
🛠️ Tools for Transformers compression using Lightning ⚡

Bert-squeeze is a repository aiming to provide code to reduce the size of Transformer-based models or decrease their latency at inference time.

Jules Belveze 66 Dec 11, 2022
LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image.

This project is based on ultralytics/yolov3. LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image. Download $ git clone http

26 Dec 13, 2022
Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.

Value Retrieval with Arbitrary Queries for Form-like Documents Introduction Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-

Salesforce 13 Sep 15, 2022
[NeurIPS 2021] SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

SSUL - Official Pytorch Implementation (NeurIPS 2021) SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning Sun

Clova AI Research 44 Dec 27, 2022
A clean and robust Pytorch implementation of PPO on continuous action space.

PPO-Continuous-Pytorch I found the current implementation of PPO on continuous action space is whether somewhat complicated or not stable. And this is

XinJingHao 56 Dec 16, 2022
A minimalist environment for decision-making in autonomous driving

highway-env A collection of environments for autonomous driving and tactical decision-making tasks An episode of one of the environments available in

Edouard Leurent 1.6k Jan 07, 2023
Source code for "Progressive Transformers for End-to-End Sign Language Production" (ECCV 2020)

Progressive Transformers for End-to-End Sign Language Production Source code for "Progressive Transformers for End-to-End Sign Language Production" (B

58 Dec 21, 2022
Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集

English | 简体中文 Latest News 2021.10.25 Paper "Docking-based Virtual Screening with Multi-Task Learning" is accepted by BIBM 2021. 2021.07.29 PaddleHeli

633 Jan 04, 2023