[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Overview

Single Image Depth Prediction with Wavelet Decomposition

Michaรซl Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambetov

CVPR 2021

[Link to paper]

kitti gif nyu gif

We introduce WaveletMonoDepth, which improves efficiency of standard encoder-decoder monocular depth estimation methods by exploiting wavelet decomposition.

5 minute CVPR presentation video link

๐Ÿง‘โ€๐Ÿซ Methodology

WaveletMonoDepth was implemented for two benchmarks, KITTI and NYUv2. For each dataset, we build our code upon a baseline code. Both baselines share a common encoder-decoder architecture, and we modify their decoder to provide a wavelet prediction.

Wavelets predictions are sparse, and can therefore be computed only at relevant locations, therefore saving a lot of unnecessary computations.

our architecture

The network is first trained with a dense convolutions in the decoder until convergence, and the dense convolutions are then replaced with sparse ones.

This is because the network first needs to learn to predict sparse wavelet coefficients before we can use sparse convolutions.

๐Ÿ—‚ Environment Requirements ๐Ÿ—‚

We recommend creating a new Anaconda environment to use WaveletMonoDepth. Use the following to setup a new environment:

conda env create -f environment.yml
conda activate wavelet-mdp

Our work uses Pytorch Wavelets, a great package from Fergal Cotter which implements the Inverse Discrete Wavelet Transform (IDWT) used in our work, and a lot more! To install Pytorch Wavelets, simply run:

git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .

๐Ÿš— ๐Ÿšฆ KITTI ๐ŸŒณ ๐Ÿ›ฃ

Depth Hints was used as a baseline for KITTI.

Depth Hints builds upon monodepth2. If you have questions about running the code, please see the issues in their repositories first.

โš™ Setup, Training and Evaluation

Please see the KITTI directory of this repository for details on how to train and evaluate our method.

๐Ÿ“Š Results ๐Ÿ“ฆ Trained models

Please find below the scores using dense convolutions to predict wavelet coefficients. Download links coming soon!

Model name Training modality Resolution abs_rel RMSE ฮด<1.25 Weights Eigen Predictions
Ours Resnet18 Stereo + DepthHints 640 x 192 0.106 4.693 0.876 Coming soon Coming soon
Ours Resnet50 Stereo + DepthHints 640 x 192 0.105 4.625 0.879 Coming soon Coming soon
Ours Resnet18 Stereo + DepthHints 1024 x 320 0.102 4.452 0.890 Coming soon Coming soon
Ours Resnet50 Stereo + DepthHints 1024 x 320 0.097 4.387 0.891 Coming soon Coming soon

๐ŸŽš Playing with sparsity

However the most interesting part is that we can make use of the sparsity property of the predicted wavelet coefficients to trade-off performance with efficiency, at a minimal cost on performance. We do so by tuning the threshold, and:

  • low thresholds values will lead to high performance but high number of computations,
  • high thresholds will lead to highly efficient computation, as convolutions will be computed only in a few pixel locations. This will have a minimal impact on performance.

sparsify kitti

Computing coefficients at only 10% of the pixels in the decoding process gives a relative score loss of less than 1.4%.

scores kitti

Our wavelet based method allows us to greatly reduce the number of computation in the decoder at a minimal expense in performance. We can measure the performance-vs-efficiency trade-off by evaluating scores vs FLOPs.

scores vs flops kitti

๐Ÿช‘ ๐Ÿ› NYUv2 ๐Ÿ›‹ ๐Ÿšช

Dense Depth was used as a baseline for NYUv2. Note that we used the experimental PyTorch implementation of DenseDepth. Note that compared to the original paper, we made a few different modifications:

  • we supervise depth directly instead of supervising disparity
  • we do not use SSIM
  • we use DenseNet161 as encoder instead of DenseNet169

โš™ Setup, Training and Evaluation

Please see the NYUv2 directory of this repository for details on how to train and evaluate our method.

๐Ÿ“Š Results and ๐Ÿ“ฆ Trained models

Please find below the scores and associated trained models, using dense convolutions to predict wavelet coefficients.

Model name Encoder Resolution abs_rel RMSE ฮด<1.25 ฮต_acc Weights Eigen Predictions
Baseline DenseNet 640 x 480 0.1277 0.5479 0.8430 1.7170 Coming soon Coming soon
Ours DenseNet 640 x 480 0.1258 0.5515 0.8451 1.8070 Coming soon Coming soon
Baseline MobileNetv2 640 x 480 0.1772 0.6638 0.7419 1.8911 Coming soon Coming soon
Ours MobileNetv2 640 x 480 0.1727 0.6776 0.7380 1.9732 Coming soon Coming soon

๐ŸŽš Playing with sparsity

As with the KITTI dataset, we can tune the wavelet threshold to greatly reduce computation at minimal cost on performance.

sparsify nyu

Computing coefficients at only 5% of the pixels in the decoding process gives a relative depth score loss of less than 0.15%.

scores nyu

๐ŸŽฎ Try it yourself!

Try using our Jupyter notebooks to visualize results with different levels of sparsity, as well as compute the resulting computational saving in FLOPs. Notebooks can be found in <DATASET>/sparsity_test_notebook.ipynb where <DATASET> is either KITTI or NYUv2.

โœ๏ธ ๐Ÿ“„ Citation

If you find our work useful or interesting, please consider citing our paper:

@inproceedings{ramamonjisoa-2021-wavelet-monodepth,
  title     = {Single Image Depth Prediction with Wavelet Decomposition},
  author    = {Ramamonjisoa, Micha{\"{e}}l and
               Michael Firman and
               Jamie Watson and
               Vincent Lepetit and
               Daniyar Turmukhambetov},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  month = {June},
  year = {2021}
}

๐Ÿ‘ฉโ€โš–๏ธ License

Copyright ยฉ Niantic, Inc. 2021. Patent Pending. All rights reserved. Please see the license file for terms.

Owner
Niantic Labs
Building technologies and ideas that move us
Niantic Labs
NeuralDiff: Segmenting 3D objects that move in egocentric videos

NeuralDiff: Segmenting 3D objects that move in egocentric videos Project Page | Paper + Supplementary | Video About This repository contains the offic

Vadim Tschernezki 14 Dec 05, 2022
Image-to-image regression with uncertainty quantification in PyTorch

Image-to-image regression with uncertainty quantification in PyTorch. Take any dataset and train a model to regress images to images with rigorous, distribution-free uncertainty quantification.

Anastasios Angelopoulos 25 Dec 26, 2022
Software associated to AAAI paper "Planning with Biological Neurons and Synapses"

jBrain Software associated with the AAAI 2022 paper Francesco D'Amore, Daniel Mitropolsky, Pierluigi Crescenzi, Emanuele Natale, Christos H. Papadimit

Pierluigi Crescenzi 1 Apr 10, 2022
An efficient and easy-to-use deep learning model compression framework

TinyNeuralNetwork ็ฎ€ไฝ“ไธญๆ–‡ TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework, which contains features like neura

Alibaba 441 Dec 25, 2022
PyoMyo - Python Opensource Myo library

PyoMyo Python module for the Thalmic Labs Myo armband. Cross platform and multithreaded and works without the Myo SDK. pip install pyomyo Documentati

PerlinWarp 81 Jan 08, 2023
ArcaneGAN by Alex Spirin

ArcaneGAN by Alex Spirin

Alex 617 Dec 28, 2022
Code release for Convolutional Two-Stream Network Fusion for Video Action Recognition

Convolutional Two-Stream Network Fusion for Video Action Recognition

Christoph Feichtenhofer 676 Dec 31, 2022
Deep motion transfer

animation-with-keypoint-mask Paper The right most square is the final result. Softmax mask (circles): \ Heatmap mask: \ conda env create -f environmen

9 Nov 01, 2022
Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases.

Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases. Ivy wraps the functional APIs of existing frameworks. Framework-agnostic functions, libraries an

Ivy 8.2k Jan 02, 2023
High-quality implementations of standard and SOTA methods on a variety of tasks.

Uncertainty Baselines The goal of Uncertainty Baselines is to provide a template for researchers to build on. The baselines can be a starting point fo

Google 1.1k Dec 30, 2022
Automatic caption evaluation metric based on typicality analysis.

SeMantic and linguistic UndeRstanding Fusion (SMURF) Automatic caption evaluation metric described in the paper "SMURF: SeMantic and linguistic UndeRs

Joshua Feinglass 6 Jan 09, 2022
Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

Despite its importance, there are few previous works applying I2I translation to webtoon. I collected dataset from naver webtoon ์—ฐ์• ํ˜๋ช… and tried to transfer human faces to webtoon domain.

์ด์ƒ์œค 64 Oct 19, 2022
PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning PyTorch code for our ACL 2020 paper "MART: Memory-Augmented Recur

Jie Lei ้›ทๆฐ 151 Jan 06, 2023
BraTs-VNet - BraTS(Brain Tumour Segmentation) using V-Net

BraTS(Brain Tumour Segmentation) using V-Net This project is an approach to dete

Rituraj Dutta 7 Nov 27, 2022
Python scripts form performing stereo depth estimation using the CoEx model in ONNX.

ONNX-CoEx-Stereo-Depth-estimation Python scripts form performing stereo depth estimation using the CoEx model in ONNX. Stereo depth estimation on the

Ibai Gorordo 8 Dec 29, 2022
A curated list of Generative Deep Art projects, tools, artworks, and models

Generative Deep Art A curated list of Generative Deep Art projects, tools, artworks, and models Inbox Get started with making AI art in 2022 โ€“ deeplea

Filipe Calegario 251 Jan 03, 2023
Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

[PDF] | [Slides] The official implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021 Long talk) Installation Inst

MilaGraph 117 Dec 09, 2022
Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)

DocFormer - PyTorch Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for t

171 Jan 06, 2023
Neural network for stock price prediction

neural_network_for_stock_price_prediction Neural networks for stock price predic

2 Feb 04, 2022
ใ€ŠImage2Reverb: Cross-Modal Reverb Impulse Response Synthesisใ€‹(2021)

Image2Reverb Image2Reverb is an end-to-end neural network that generates plausible audio impulse responses from single images of acoustic environments

Nikhil Singh 48 Nov 27, 2022