Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Last update: Dec 22, 2022

Related tags

Overview

Self-Supervised Multi-Frame Monocular Scene Flow

3D visualization of estimated depth and scene flow (overlayed with input image) from temporally consecutive images.
Trained on KITTI in a self-supervised manner, and tested on DAVIS.

This repository is the official PyTorch implementation of the paper:

   Self-Supervised Multi-Frame Monocular Scene Flow
   Junhwa Hur and Stefan Roth
   CVPR, 2021
   Arxiv

Contact: junhwa.hur[at]gmail.com

Installation

The code has been tested with Anaconda (Python 3.8), PyTorch 1.8.1 and CUDA 10.1 (Different Pytorch + CUDA version is also compatible).
Please run the provided conda environment setup file:

conda env create -f environment.yml
conda activate multi-mono-sf

(Optional) Using the CUDA implementation of the correlation layer accelerates training (~50% faster):

./install_correlation.sh

After installing it, turn on this flag --correlation_cuda_enabled=True in training/evaluation script files.

Dataset

Please download the following to datasets for the experiment:

KITTI Raw Data (synced+rectified data, please refer MonoDepth2 for downloading all data more conveniently.)
merge KITTI Scene Flow 2015 and Multi-view extension in the same folder.

To save space, we convert the KITTI Raw png images to jpeg, following the convention from MonoDepth:

find (data_folder)/ -name '*.png' | parallel 'convert {.}.png {.}.jpg && rm {}'

We also converted images in KITTI Scene Flow 2015 as well. Please convert the png images in image_2 and image_3 into jpg and save them into the seperate folder image_2_jpg and image_3_jpg.
To save space further, you can delete the velodyne point data in KITTI raw data as we don't need it.

Training and Inference

The scripts folder contains training/inference scripts.

For self-supervised training, you can simply run the following script files:

Script	Training	Dataset
`./train_selfsup.sh`	Self-supervised	KITTI Split

Fine-tuning is done with two stages: (i) first finding the stopping point using train/valid split, and then (ii) fune-tuning using all data with the found iteration steps.

Script	Training	Dataset
`./ft_1st_stage.sh`	Semi-supervised finetuning	KITTI raw + KITTI 2015
`./ft_2nd_stage.sh`	Semi-supervised finetuning	KITTI raw + KITTI 2015

In the script files, please configure these following PATHs for experiments:

DATA_HOME : the directory where the training or test is located in your local system.
EXPERIMENTS_HOME : your own experiment directory where checkpoints and log files will be saved.

To test pretrained models, you can simply run the following script files:

Script	Training	Dataset
`./eval_selfsup_train.sh`	self-supervised	KITTI 2015 Train
`./eval_ft_test.sh`	fine-tuned	KITTI 2015 Test
`./eval_davis.sh`	self-supervised	DAVIS (one scene)
`./eval_davis_all.sh`	self-supervised	DAVIS (all scenes)

To save visuailization of outputs, please turn on --save_vis=True in the script.
To save output images for KITTI Scene Flow 2015 Benchmark submission, please turn on --save_out=True in the script.

Pretrained Models

The checkpoints folder contains the checkpoints of the pretrained models.

Acknowledgement

Please cite our paper if you use our source code.

@inproceedings{Hur:2021:SSM,  
  Author = {Junhwa Hur and Stefan Roth},  
  Booktitle = {CVPR},  
  Title = {Self-Supervised Multi-Frame Monocular Scene Flow},  
  Year = {2021}  
}

Portions of the source code (e.g., training pipeline, runtime, argument parser, and logger) are from Jochen Gast

Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Related tags

Overview

Self-Supervised Multi-Frame Monocular Scene Flow

Installation

Dataset

Training and Inference

Pretrained Models

Acknowledgement

Owner

Visual Inference Lab @TU Darmstadt

A computational block to solve entity alignment over textual attributes in a knowledge graph creation pipeline.

Semi-Supervised Learning, Object Detection, ICCV2021

The self-supervised goal reaching benchmark introduced in Discovering and Achieving Goals via World Models

Python package for downloading ECMWF reanalysis data and converting it into a time series format.

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

AITUS - An atomatic notr maker for CYTUS

Pull sensitive data from users on windows including discord tokens and chrome data.

This repo includes our code for evaluating and improving transferability in domain generalization (NeurIPS 2021)

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

(CVPR 2022) A minimalistic mapless end-to-end stack for joint perception, prediction, planning and control for self driving.

Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness?

Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Example how to deploy deep learning model with aiohttp.

[ICCV 2021] Our work presents a novel neural rendering approach that can efficiently reconstruct geometric and neural radiance fields for view synthesis.

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

[3DV 2021] A Dataset-Dispersion Perspective on Reconstruction Versus Recognition in Single-View 3D Reconstruction Networks

CUDA Python Low-level Bindings

How to Leverage Multimodal EHR Data for Better Medical Predictions?

(EI 2022) Controllable Confidence-Based Image Denoising

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation