[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Overview

Single Image Depth Prediction with Wavelet Decomposition

Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambetov

CVPR 2021

[Link to paper]

kitti gif nyu gif

We introduce WaveletMonoDepth, which improves efficiency of standard encoder-decoder monocular depth estimation methods by exploiting wavelet decomposition.

5 minute CVPR presentation video link

🧑‍🏫 Methodology

WaveletMonoDepth was implemented for two benchmarks, KITTI and NYUv2. For each dataset, we build our code upon a baseline code. Both baselines share a common encoder-decoder architecture, and we modify their decoder to provide a wavelet prediction.

Wavelets predictions are sparse, and can therefore be computed only at relevant locations, therefore saving a lot of unnecessary computations.

our architecture

The network is first trained with a dense convolutions in the decoder until convergence, and the dense convolutions are then replaced with sparse ones.

This is because the network first needs to learn to predict sparse wavelet coefficients before we can use sparse convolutions.

🗂 Environment Requirements 🗂

We recommend creating a new Anaconda environment to use WaveletMonoDepth. Use the following to setup a new environment:

conda env create -f environment.yml
conda activate wavelet-mdp

Our work uses Pytorch Wavelets, a great package from Fergal Cotter which implements the Inverse Discrete Wavelet Transform (IDWT) used in our work, and a lot more! To install Pytorch Wavelets, simply run:

git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .

🚗 🚦 KITTI 🌳 🛣

Depth Hints was used as a baseline for KITTI.

Depth Hints builds upon monodepth2. If you have questions about running the code, please see the issues in their repositories first.

Setup, Training and Evaluation

Please see the KITTI directory of this repository for details on how to train and evaluate our method.

📊 Results 📦 Trained models

Please find below the scores using dense convolutions to predict wavelet coefficients. Download links coming soon!

Model name Training modality Resolution abs_rel RMSE δ<1.25 Weights Eigen Predictions
Ours Resnet18 Stereo + DepthHints 640 x 192 0.106 4.693 0.876 Coming soon Coming soon
Ours Resnet50 Stereo + DepthHints 640 x 192 0.105 4.625 0.879 Coming soon Coming soon
Ours Resnet18 Stereo + DepthHints 1024 x 320 0.102 4.452 0.890 Coming soon Coming soon
Ours Resnet50 Stereo + DepthHints 1024 x 320 0.097 4.387 0.891 Coming soon Coming soon

🎚 Playing with sparsity

However the most interesting part is that we can make use of the sparsity property of the predicted wavelet coefficients to trade-off performance with efficiency, at a minimal cost on performance. We do so by tuning the threshold, and:

  • low thresholds values will lead to high performance but high number of computations,
  • high thresholds will lead to highly efficient computation, as convolutions will be computed only in a few pixel locations. This will have a minimal impact on performance.

sparsify kitti

Computing coefficients at only 10% of the pixels in the decoding process gives a relative score loss of less than 1.4%.

scores kitti

Our wavelet based method allows us to greatly reduce the number of computation in the decoder at a minimal expense in performance. We can measure the performance-vs-efficiency trade-off by evaluating scores vs FLOPs.

scores vs flops kitti

🪑 🛁 NYUv2 🛋 🚪

Dense Depth was used as a baseline for NYUv2. Note that we used the experimental PyTorch implementation of DenseDepth. Note that compared to the original paper, we made a few different modifications:

  • we supervise depth directly instead of supervising disparity
  • we do not use SSIM
  • we use DenseNet161 as encoder instead of DenseNet169

Setup, Training and Evaluation

Please see the NYUv2 directory of this repository for details on how to train and evaluate our method.

📊 Results and 📦 Trained models

Please find below the scores and associated trained models, using dense convolutions to predict wavelet coefficients.

Model name Encoder Resolution abs_rel RMSE δ<1.25 ε_acc Weights Eigen Predictions
Baseline DenseNet 640 x 480 0.1277 0.5479 0.8430 1.7170 Coming soon Coming soon
Ours DenseNet 640 x 480 0.1258 0.5515 0.8451 1.8070 Coming soon Coming soon
Baseline MobileNetv2 640 x 480 0.1772 0.6638 0.7419 1.8911 Coming soon Coming soon
Ours MobileNetv2 640 x 480 0.1727 0.6776 0.7380 1.9732 Coming soon Coming soon

🎚 Playing with sparsity

As with the KITTI dataset, we can tune the wavelet threshold to greatly reduce computation at minimal cost on performance.

sparsify nyu

Computing coefficients at only 5% of the pixels in the decoding process gives a relative depth score loss of less than 0.15%.

scores nyu

🎮 Try it yourself!

Try using our Jupyter notebooks to visualize results with different levels of sparsity, as well as compute the resulting computational saving in FLOPs. Notebooks can be found in <DATASET>/sparsity_test_notebook.ipynb where <DATASET> is either KITTI or NYUv2.

✏️ 📄 Citation

If you find our work useful or interesting, please consider citing our paper:

@inproceedings{ramamonjisoa-2021-wavelet-monodepth,
  title     = {Single Image Depth Prediction with Wavelet Decomposition},
  author    = {Ramamonjisoa, Micha{\"{e}}l and
               Michael Firman and
               Jamie Watson and
               Vincent Lepetit and
               Daniyar Turmukhambetov},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  month = {June},
  year = {2021}
}

👩‍⚖️ License

Copyright © Niantic, Inc. 2021. Patent Pending. All rights reserved. Please see the license file for terms.

Owner
Niantic Labs
Building technologies and ideas that move us
Niantic Labs
Object detection evaluation metrics using Python.

Object detection evaluation metrics using Python.

Louis Facun 2 Sep 06, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

9.6k Jan 06, 2023
Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

Demetri Pananos 9 Oct 04, 2022
An Intelligent Self-driving Truck System For Highway Transportation

Inceptio Intelligent Truck System An Intelligent Self-driving Truck System For Highway Transportation Note The code is still in development. OS requir

InceptioResearch 11 Jul 13, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition, using Keras

Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition, using Keras This tutorial shows how to use Keras library to build deep ne

Marko Jocić 922 Dec 19, 2022
FOSS Digital Asset Distribution Platform built on Frappe.

Digistore FOSS Digital Assets Marketplace. Distribute digital assets, like a pro. Video Demo Here Features Create, attach and list digital assets (PDF

Mohammad Hussain Nagaria 30 Dec 08, 2022
This repo provides the official code for TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/pdf/2103.04430.pdf).

TransBTS: Multimodal Brain Tumor Segmentation Using Transformer This repo is the official implementation for TransBTS: Multimodal Brain Tumor Segmenta

Raymond 247 Dec 28, 2022
Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

Aspect Sentiment Quad Prediction (ASQP) This repo contains the annotated data and code for our paper Aspect Sentiment Quad Prediction as Paraphrase Ge

Isaac 39 Dec 11, 2022
Score refinement for confidence-based 3D multi-object tracking

Score refinement for confidence-based 3D multi-object tracking Our video gives a brief explanation of our Method. This is the official code for the pa

Cognitive Systems Research Group 47 Dec 26, 2022
The MATH Dataset

Measuring Mathematical Problem Solving With the MATH Dataset This is the repository for Measuring Mathematical Problem Solving With the MATH Dataset b

Dan Hendrycks 267 Dec 26, 2022
Learning Neural Painters Fast! using PyTorch and Fast.ai

The Joy of Neural Painting Learning Neural Painters Fast! using PyTorch and Fast.ai Blogpost with more details: The Joy of Neural Painting The impleme

Libre AI 72 Nov 10, 2022
BMVC 2021: This is the github repository for "Few Shot Temporal Action Localization using Query Adaptive Transformers" accepted in British Machine Vision Conference (BMVC) 2021, Virtual

FS-QAT: Few Shot Temporal Action Localization using Query Adaptive Transformer Accepted as Poster in BMVC 2021 This is an official implementation in P

Sauradip Nag 14 Dec 09, 2022
the official implementation of the paper "Isometric Multi-Shape Matching" (CVPR 2021)

Isometric Multi-Shape Matching (IsoMuSh) Paper-CVF | Paper-arXiv | Video | Code Citation If you find our work useful in your research, please consider

Maolin Gao 9 Jul 17, 2022
Vertex AI: Serverless framework for MLOPs (ESP / ENG)

Vertex AI: Serverless framework for MLOPs (ESP / ENG) Español Qué es esto? Este repo contiene un pipeline end to end diseñado usando el SDK de Kubeflo

Hernán Escudero 2 Apr 28, 2022
Multi-Person Extreme Motion Prediction

Multi-Person Extreme Motion Prediction Implementation for paper Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno-Noguer, Multi-Person Extre

GUO-W 38 Nov 15, 2022
This implements one of result networks from Large-scale evolution of image classifiers

Exotic structured image classifier This implements one of result networks from Large-scale evolution of image classifiers by Esteban Real, et. al. Req

54 Nov 25, 2022
Python package for downloading ECMWF reanalysis data and converting it into a time series format.

ecmwf_models Readers and converters for data from the ECMWF reanalysis models. Written in Python. Works great in combination with pytesmo. Citation If

TU Wien - Department of Geodesy and Geoinformation 31 Dec 26, 2022
Exadel CompreFace is a free and open-source face recognition GitHub project

Exadel CompreFace is a leading free and open-source face recognition system Exadel CompreFace is a free and open-source face recognition service that

Exadel 2.6k Jan 04, 2023
FaRL for Facial Representation Learning

FaRL for Facial Representation Learning This repo hosts official implementation of our paper General Facial Representation Learning in a Visual-Lingui

Microsoft 19 Jan 05, 2022