PyTorch DepthNet Training on Still Box dataset

Overview

DepthNet training on Still Box

Project page

This code can replicate the results of our paper that was published in UAVg-17. If you use this repo in your work, please cite us with the following bibtex :

@Article{isprs-annals-IV-2-W3-67-2017,
AUTHOR = {Pinard, C. and Chevalley, L. and Manzanera, A. and Filliat, D.},
TITLE = {END-TO-END DEPTH FROM MOTION WITH STABILIZED MONOCULAR VIDEOS},
JOURNAL = {ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences},
VOLUME = {IV-2/W3},
YEAR = {2017},
PAGES = {67--74},
URL = {https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/IV-2-W3/67/2017/},
DOI = {10.5194/isprs-annals-IV-2-W3-67-2017}
}

depthnet

End-to-end depth from motion with stabilized monocular videos

  • This code shows how the only translational movement of the camera can be leveraged to compute a very precise depth map, even at more than 300 times the displacement.
  • Thus, for a camera movement of 30cm (nominal displacement used here), you can see as far as 100m.

See our second paper for information about using this code on real videos with speed estimation

Multi range Real-time depth inference from a monocular stabilized footage using a Fully Convolutional Neural Network

Click Below for video

youtube video

DepthNet

DepthNet is a network designed to infer Depth Map directly from a pair of stabilized image.

  • No information is given about movement direction
  • DepthNet is Fully Convolutional, which means it is completely robust to optical center fault
  • This network only works for pinhole-like pictures

Still Box

stillbox

Still box is a dataset created specifically for supervised training of depth map inference for stabilized aerial footage. It tries to mimic typical drone footages in static scenes, and depth is impossible to infer from a single image, as shapes get all kinds of sizes and positions.

  • You can download it here
  • The dataset webpage also provides a tutorial on how to read the data

Training

Requirements

[sudo] pip3 install -r requirements.txt

If you want to log some outputs from the validation set with the --log-output option, you need openCV python bindings to convert depth to RGB with a rainbow colormap.

If you don't have opencv, grayscales will be logged

Usage

Best results can be obtained by training on still box 64 and then finetuned successively up to the resolution you target. Here are the parameters used for the paper (please note how learning rate and batch size are changed, training was done a single GTX 980Ti).

python3 train.py -j8 --lr 0.01 /path/to/still_box/64/ --log-output --activation-function elu --bn
python3 train.py -j8 --lr 0.01 /path/to/still_box/128/ --log-output --activation-function elu --bn --pretrained /path/to/DepthNet64
python3 train.py -j8 --lr 0.001 /path/to/still_box/256/ --log-output --activation-function elu --bn -b64 --pretrained /path/to/DepthNet128
python3 train.py -j8 --lr 0.001 /path/to/still_box/512/ --log-output --activation-function elu --bn -b16 --pretrained /path/to/DepthNet256

Note: You can skip 128 and 256 training if you don't have time, results will be only slightly worse. However, you need to do 64 training first as stated by our first paper. This might has something to do with either the size of 64 dataset (in terms of scene numbers) or the fact that feature maps are reduced down to 1x1 making last convolution a FC equivalent operation

Pretrained networks

Best results were obtained with elu for depth activation (not mentionned in the original paper), along with BatchNorm.

Name training set Error (m)
DepthNet_elu_bn_64.pth.tar 64 4.65 Link
DepthNet_elu_bn_128.pth.tar 128 3.08 Link
DepthNet_elu_bn_256.pth.tar 256 2.29 Link
DepthNet_elu_bn_512.pth.tar 512 1.97 Link

All the networks have the same size and same structure.

Custom FOV and focal length

Every image in still box is 90° of FOV (field of view), focal length (in pixels) is then respectively

  • 32px for 64x64 images
  • 64px for 128x128 images
  • 128px for 128x128 images
  • 256px for 512x512 images

Training is not flexible to focal length, and for a custom focal length you will have to run a dedicated training.

If you need to use a custom focal length and FOV you can simply resize the pictures and crop them.

Say you have a picture of width w with an associated FOV fov. To get equivalent from one of the datasets you can first crop the still box pictures so that FOV will match fov (cropping doesn't affect focal length in pixels), and then resize it to w. Note that DepthNet can take rectangular pictures as input.

cropped_w = w/tan(pi*fov/360)

we naturally recommend to do this operation offline, metadata from metadata.json won't need to be altered.

with pretrained DepthNet

If you can resize your test pictures, thanks to its fully convolutional architecture, DepthNet is flexible to fov, as long as it stays below 90° (or max FOV encountered during training). Referring back to our witdh w and FOV fov we get with a network trained with a particular focal length f the following width to resize to:

resized_w = f/2*tan(pi*fov/360)

That way, you won't have to make a dedicated training or even download the still box dataset


/!\ These equations are only valid with pinhole equivalent cameras. Be sure to correct distortion before using DepthNet

Testing Inference

The run_inference.py lets you run an inference on a folder of images, and save the depth maps in different visualizations.

A simple still box scene of 512x512 pictures for testing can be downloaded here. Otherwise, any folder with a list of jpg images will do, provided you follow the guidelines above.

python3 run_inference.py --output-depth --no-resize --dataset-dir /path/to/stub_box --pretrained /path/to/DepthNet512 --frame-shift 3 --output-dir /path/to/save/outputs

Visualise training

Training can be visualized via tensorboard by launching this command in another terminal

tensorboard --logdir=/path/to/DepthNet/Results

You can then access the board from any computer in the local network by accessing machine_ip:6006 from a web browser, just as a regular tensorboard server. More info here

Owner
Clément Pinard
PhD ENSTA Paris, Deep Learning Engineer @ ContentSquare
Clément Pinard
Vignette is a face tracking software for characters using osu!framework.

Vignette is a face tracking software for characters using osu!framework. Unlike most solutions, Vignette is: Made with osu!framework, the game framewo

Vignette 412 Dec 28, 2022
1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking

Instead, two models for appearance modeling are included, together with the open-source BAGS model and the full set of code for inference. With this code, you can achieve around 79 Oct 08, 2022

TensorFlow 101: Introduction to Deep Learning for Python Within TensorFlow

TensorFlow 101: Introduction to Deep Learning I have worked all my life in Machine Learning, and I've never seen one algorithm knock over its benchmar

Sefik Ilkin Serengil 896 Jan 04, 2023
Neural style transfer as a class in PyTorch

pt-styletransfer Neural style transfer as a class in PyTorch Based on: https://github.com/alexis-jacq/Pytorch-Tutorials Adds: StyleTransferNet as a cl

Tyler Kvochick 31 Jun 27, 2022
Implementation for paper "Towards the Generalization of Contrastive Self-Supervised Learning"

Contrastive Self-Supervised Learning on CIFAR-10 Paper "Towards the Generalization of Contrastive Self-Supervised Learning", Weiran Huang, Mingyang Yi

Weiran Huang 13 Nov 30, 2022
Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"

When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer This repository contains code for our paper titled "When is BERT M

Princeton Natural Language Processing 9 Dec 23, 2022
This implementation contains the application of GPlearn's symbolic transformer on a commodity futures sector of the financial market.

GPlearn_finiance_stock_futures_extension This implementation contains the application of GPlearn's symbolic transformer on a commodity futures sector

Chengwei <a href=[email protected]"> 189 Dec 25, 2022
Can we visualize a large scientific data set with a surrogate model? We're building a GAN for the Earth's Mantle Convection data set to see if we can!

EarthGAN - Earth Mantle Surrogate Modeling Can a surrogate model of the Earth’s Mantle Convection data set be built such that it can be readily run in

Tim 0 Dec 09, 2021
A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? Source: Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

Caiyong Wang 14 Sep 20, 2022
Intrinsic Image Harmonization

Intrinsic Image Harmonization [Paper] Zonghui Guo, Haiyong Zheng, Yufeng Jiang, Zhaorui Gu, Bing Zheng Here we provide PyTorch implementation and the

VISION @ OUC 44 Dec 21, 2022
Implementation of " SESS: Self-Ensembling Semi-Supervised 3D Object Detection" (CVPR2020 Oral)

SESS: Self-Ensembling Semi-Supervised 3D Object Detection Created by Na Zhao from National University of Singapore Introduction This repository contai

125 Dec 23, 2022
Automatic library of congress classification, using word embeddings from book titles and synopses.

Automatic Library of Congress Classification The Library of Congress Classification (LCC) is a comprehensive classification system that was first deve

Ahmad Pourihosseini 3 Oct 01, 2022
A Deep Learning based project for creating line art portraits.

ArtLine The main aim of the project is to create amazing line art portraits. Sounds Intresting,let's get to the pictures!! Model-(Smooth) Model-(Quali

Vijish Madhavan 3.3k Jan 07, 2023
GyroSPD: Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices

GyroSPD Code for the paper "Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices" accepted at NeurIPS 2021. Re

Federico Lopez 12 Dec 12, 2022
Implicit Model Specialization through DAG-based Decentralized Federated Learning

Federated Learning DAG Experiments This repository contains software artifacts to reproduce the experiments presented in the Middleware '21 paper "Imp

Operating Systems and Middleware Group 5 Oct 16, 2022
[NeurIPS 2020] Code for the paper "Balanced Meta-Softmax for Long-Tailed Visual Recognition"

Balanced Meta-Softmax Code for the paper Balanced Meta-Softmax for Long-Tailed Visual Recognition Jiawei Ren, Cunjun Yu, Shunan Sheng, Xiao Ma, Haiyu

Jiawei Ren 65 Dec 21, 2022
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
A ssl analyzer which could analyzer target domain's certificate.

ssl_analyzer A ssl analyzer which could analyzer target domain's certificate. Analyze the domain name ssl certificate information according to the inp

vincent 17 Dec 12, 2022
A simple Tensorflow based library for deep and/or denoising AutoEncoder.

libsdae - deep-Autoencoder & denoising autoencoder A simple Tensorflow based library for Deep autoencoder and denoising AE. Library follows sklearn st

Rajarshee Mitra 147 Nov 18, 2022
Supervised domain-agnostic prediction framework for probabilistic modelling

A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data

The Alan Turing Institute 112 Oct 23, 2022