Pytorch version of SfmLearner from Tinghui Zhou et al.

Overview

SfMLearner Pytorch version

This codebase implements the system described in the paper:

Unsupervised Learning of Depth and Ego-Motion from Video

Tinghui Zhou, Matthew Brown, Noah Snavely, David G. Lowe

In CVPR 2017 (Oral).

See the project webpage for more details.

Original Author : Tinghui Zhou ([email protected]) Pytorch implementation : Clément Pinard ([email protected])

sample_results

Preamble

This codebase was developed and tested with Pytorch 1.0.1, CUDA 10 and Ubuntu 16.04. Original code was developped in tensorflow, you can access it here

Prerequisite

pip3 install -r requirements.txt

or install manually the following packages :

pytorch >= 1.0.1
pebble
matplotlib
imageio
scipy
argparse
tensorboardX
blessings
progressbar2
path.py

Note

Because it uses latests pytorch features, it is not compatible with anterior versions of pytorch.

If you don't have an up to date pytorch, the tags can help you checkout the right commits corresponding to your pytorch version.

What has been done

  • Training has been tested on KITTI and CityScapes.
  • Dataset preparation has been largely improved, and now stores image sequences in folders, making sure that movement is each time big enough between each frame
  • That way, training is now significantly faster, running at ~0.14sec per step vs ~0.2s per steps initially (on a single GTX980Ti)
  • In addition you don't need to prepare data for a particular sequence length anymore as stacking is made on the fly.
  • You can still choose the former stacked frames dataset format.
  • Convergence is now almost as good as original paper with same hyper parameters
  • You can know compare with groud truth for your validation set. It is still possible to validate without, but you now can see that minimizing photometric error is not equivalent to optimizing depth map.

Differences with official Implementation

  • Smooth Loss is different from official repo. Instead of applying it to disparity, we apply it to depth. Original disparity smooth loss did not work well (don't know why !) and it did not even converge at all with weight values used (0.5).
  • loss is divided by 2.3 when downscaling instead of 2. This is the results of empiric experiments, so the optimal value is clearly not carefully determined.
  • As a consequence, with a smooth loss of 2.0̀, depth test is better, but Pose test is worse. To revert smooth loss back to original, you can change it here

Preparing training data

Preparation is roughly the same command as in the original code.

For KITTI, first download the dataset using this script provided on the official website, and then run the following command. The --with-depth option will save resized copies of groundtruth to help you setting hyper parameters. The --with-pose will dump the sequence pose in the same format as Odometry dataset (see pose evaluation)

python3 data/prepare_train_data.py /path/to/raw/kitti/dataset/ --dataset-format 'kitti' --dump-root /path/to/resulting/formatted/data/ --width 416 --height 128 --num-threads 4 [--static-frames /path/to/static_frames.txt] [--with-depth] [--with-pose]

For Cityscapes, download the following packages: 1) leftImg8bit_sequence_trainvaltest.zip, 2) camera_trainvaltest.zip. You will probably need to contact the administrators to be able to get it. Then run the following command

python3 data/prepare_train_data.py /path/to/cityscapes/dataset/ --dataset-format 'cityscapes' --dump-root /path/to/resulting/formatted/data/ --width 416 --height 171 --num-threads 4

Notice that for Cityscapes the img_height is set to 171 because we crop out the bottom part of the image that contains the car logo, and the resulting image will have height 128.

Training

Once the data are formatted following the above instructions, you should be able to train the model by running the following command

python3 train.py /path/to/the/formatted/data/ -b4 -m0.2 -s0.1 --epoch-size 3000 --sequence-length 3 --log-output [--with-gt]

You can then start a tensorboard session in this folder by

tensorboard --logdir=checkpoints/

and visualize the training progress by opening https://localhost:6006 on your browser. If everything is set up properly, you should start seeing reasonable depth prediction after ~30K iterations when training on KITTI.

Evaluation

Disparity map generation can be done with run_inference.py

python3 run_inference.py --pretrained /path/to/dispnet --dataset-dir /path/pictures/dir --output-dir /path/to/output/dir

Will run inference on all pictures inside dataset-dir and save a jpg of disparity (or depth) to output-dir for each one see script help (-h) for more options.

Disparity evaluation is avalaible

python3 test_disp.py --pretrained-dispnet /path/to/dispnet --pretrained-posenet /path/to/posenet --dataset-dir /path/to/KITTI_raw --dataset-list /path/to/test_files_list

Test file list is available in kitti eval folder. To get fair comparison with Original paper evaluation code, don't specify a posenet. However, if you do, it will be used to solve the scale factor ambiguity, the only ground truth used to get it will be vehicle speed which is far more acceptable for real conditions quality measurement, but you will obviously get worse results.

Pose evaluation is also available on Odometry dataset. Be sure to download both color images and pose !

python3 test_pose.py /path/to/posenet --dataset-dir /path/to/KITIT_odometry --sequences [09]

ATE (Absolute Trajectory Error) is computed as long as RE for rotation (Rotation Error). RE between R1 and R2 is defined as the angle of R1*R2^-1 when converted to axis/angle. It corresponds to RE = arccos( (trace(R1 @ R2^-1) - 1) / 2). While ATE is often said to be enough to trajectory estimation, RE seems important here as sequences are only seq_length frames long.

Pretrained Nets

Avalaible here

Arguments used :

python3 train.py /path/to/the/formatted/data/ -b4 -m0 -s2.0 --epoch-size 1000 --sequence-length 5 --log-output --with-gt

Depth Results

Abs Rel Sq Rel RMSE RMSE(log) Acc.1 Acc.2 Acc.3
0.181 1.341 6.236 0.262 0.733 0.901 0.964

Pose Results

5-frames snippets used

Seq. 09 Seq. 10
ATE 0.0179 (std. 0.0110) 0.0141 (std. 0.0115)
RE 0.0018 (std. 0.0009) 0.0018 (std. 0.0011)

Discussion

Here I try to link the issues that I think raised interesting questions about scale factor, pose inference, and training hyperparameters

  • Issue 48 : Why is target frame at the center of the sequence ?
  • Issue 39 : Getting pose vector without the scale factor uncertainty
  • Issue 46 : Is Interpolated groundtruth better than sparse groundtruth ?
  • Issue 45 : How come the inverse warp is absolute and pose and depth are only relative ?
  • Issue 32 : Discussion about validation set, and optimal batch size
  • Issue 25 : Why filter out static frames ?
  • Issue 24 : Filtering pixels out of the photometric loss
  • Issue 60 : Inverse warp is only one way !

Other Implementations

TensorFlow by tinghuiz (original code, and paper author)

Owner
Clément Pinard
PhD ENSTA Paris, Deep Learning Engineer @ ContentSquare
Clément Pinard
EDCNN: Edge enhancement-based Densely Connected Network with Compound Loss for Low-Dose CT Denoising

EDCNN: Edge enhancement-based Densely Connected Network with Compound Loss for Low-Dose CT Denoising By Tengfei Liang, Yi Jin, Yidong Li, Tao Wang. Th

workingcoder 115 Jan 05, 2023
ComPhy: Compositional Physical Reasoning ofObjects and Events from Videos

ComPhy This repository holds the code for the paper. ComPhy: Compositional Physical Reasoning ofObjects and Events from Videos, (Under review) PDF Pro

29 Dec 29, 2022
Accompanying code for the paper "A Kernel Test for Causal Association via Noise Contrastive Backdoor Adjustment".

#backdoor-HSIC (bd_HSIC) Accompanying code for the paper "A Kernel Test for Causal Association via Noise Contrastive Backdoor Adjustment". To generate

Robert Hu 0 Nov 25, 2021
Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

Fangjinhua Wang 127 Jan 04, 2023
Few-Shot Graph Learning for Molecular Property Prediction

Few-shot Graph Learning for Molecular Property Prediction Introduction This is the source code and dataset for the following paper: Few-shot Graph Lea

Zhichun Guo 94 Dec 12, 2022
A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms This repo contains the source code to reproduce the results in the paper A Close

Costa Huang 73 Dec 24, 2022
This game was designed to encourage young people not to gamble on lotteries, as the probablity of correctly guessing the number is infinitesimal!

Lottery Simulator 2022 for Web Launch Application Developed by John Seong in Ontario. This game was designed to encourage young people not to gamble o

John Seong 2 Sep 02, 2022
Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually

Portrait Segmentation using Tensorflow This script removes the background from an input image. You can read more about segmentation here Setup The scr

291 Dec 24, 2022
Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer Paper on arXiv Public PyTorch implementation of two-stage peer-reg

NNAISENSE 38 Oct 14, 2022
3rd Place Solution of the Traffic4Cast Core Challenge @ NeurIPS 2021

3rd Place Solution of Traffic4Cast 2021 Core Challenge This is the code for our solution to the NeurIPS 2021 Traffic4Cast Core Challenge. Paper Our so

7 Jul 25, 2022
This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

Wonyong Jeong 15 Nov 21, 2022
🐦 Opytimizer is a Python library consisting of meta-heuristic optimization techniques.

Opytimizer: A Nature-Inspired Python Optimizer Welcome to Opytimizer. Did you ever reach a bottleneck in your computational experiments? Are you tired

Gustavo Rosa 546 Dec 31, 2022
AI pipelines for Nvidia Jetson Platform

Jetson Multicamera Pipelines Easy-to-use realtime CV/AI pipelines for Nvidia Jetson Platform. This project: Builds a typical multi-camera pipeline, i.

NVIDIA AI IOT 96 Dec 23, 2022
DimReductionClustering - Dimensionality Reduction + Clustering + Unsupervised Score Metrics

Dimensionality Reduction + Clustering + Unsupervised Score Metrics Introduction

11 Nov 15, 2022
Benchmarking Pipeline for Prediction of Protein-Protein Interactions

B4PPI Benchmarking Pipeline for the Prediction of Protein-Protein Interactions How this benchmarking pipeline has been built, and how to use it, is de

Loïc Lannelongue 4 Jun 27, 2022
Codebase for "ProtoAttend: Attention-Based Prototypical Learning."

Codebase for "ProtoAttend: Attention-Based Prototypical Learning." Authors: Sercan O. Arik and Tomas Pfister Paper: Sercan O. Arik and Tomas Pfister,

47 2 May 17, 2022
PyTorch implementations of neural network models for keyword spotting

Honk: CNNs for Keyword Spotting Honk is a PyTorch reimplementation of Google's TensorFlow convolutional neural networks for keyword spotting, which ac

Castorini 475 Dec 15, 2022
Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020

Learning Canonical Representations for Scene Graph to Image Generation (ECCV 2020) Roei Herzig*, Amir Bar*, Huijuan Xu, Gal Chechik, Trevor Darrell, A

roei_herzig 24 Jul 07, 2022
This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.

EEND-vector clustering The EEND-vector clustering (End-to-End-Neural-Diarization-vector clustering) is a speaker diarization framework that integrates

45 Dec 26, 2022
An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

About This repository shows how Autonomous Learning Library can be used to build new reinforcement learning agents. In particular, it contains a model

Chris Nota 5 Aug 30, 2022