[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Overview

Single Image Depth Prediction with Wavelet Decomposition

Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambetov

CVPR 2021

[Link to paper]

kitti gif nyu gif

We introduce WaveletMonoDepth, which improves efficiency of standard encoder-decoder monocular depth estimation methods by exploiting wavelet decomposition.

5 minute CVPR presentation video link

🧑‍🏫 Methodology

WaveletMonoDepth was implemented for two benchmarks, KITTI and NYUv2. For each dataset, we build our code upon a baseline code. Both baselines share a common encoder-decoder architecture, and we modify their decoder to provide a wavelet prediction.

Wavelets predictions are sparse, and can therefore be computed only at relevant locations, therefore saving a lot of unnecessary computations.

our architecture

The network is first trained with a dense convolutions in the decoder until convergence, and the dense convolutions are then replaced with sparse ones.

This is because the network first needs to learn to predict sparse wavelet coefficients before we can use sparse convolutions.

🗂 Environment Requirements 🗂

We recommend creating a new Anaconda environment to use WaveletMonoDepth. Use the following to setup a new environment:

conda env create -f environment.yml
conda activate wavelet-mdp

Our work uses Pytorch Wavelets, a great package from Fergal Cotter which implements the Inverse Discrete Wavelet Transform (IDWT) used in our work, and a lot more! To install Pytorch Wavelets, simply run:

git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .

🚗 🚦 KITTI 🌳 🛣

Depth Hints was used as a baseline for KITTI.

Depth Hints builds upon monodepth2. If you have questions about running the code, please see the issues in their repositories first.

Setup, Training and Evaluation

Please see the KITTI directory of this repository for details on how to train and evaluate our method.

📊 Results 📦 Trained models

Please find below the scores using dense convolutions to predict wavelet coefficients. Download links coming soon!

Model name Training modality Resolution abs_rel RMSE δ<1.25 Weights Eigen Predictions
Ours Resnet18 Stereo + DepthHints 640 x 192 0.106 4.693 0.876 Coming soon Coming soon
Ours Resnet50 Stereo + DepthHints 640 x 192 0.105 4.625 0.879 Coming soon Coming soon
Ours Resnet18 Stereo + DepthHints 1024 x 320 0.102 4.452 0.890 Coming soon Coming soon
Ours Resnet50 Stereo + DepthHints 1024 x 320 0.097 4.387 0.891 Coming soon Coming soon

🎚 Playing with sparsity

However the most interesting part is that we can make use of the sparsity property of the predicted wavelet coefficients to trade-off performance with efficiency, at a minimal cost on performance. We do so by tuning the threshold, and:

  • low thresholds values will lead to high performance but high number of computations,
  • high thresholds will lead to highly efficient computation, as convolutions will be computed only in a few pixel locations. This will have a minimal impact on performance.

sparsify kitti

Computing coefficients at only 10% of the pixels in the decoding process gives a relative score loss of less than 1.4%.

scores kitti

Our wavelet based method allows us to greatly reduce the number of computation in the decoder at a minimal expense in performance. We can measure the performance-vs-efficiency trade-off by evaluating scores vs FLOPs.

scores vs flops kitti

🪑 🛁 NYUv2 🛋 🚪

Dense Depth was used as a baseline for NYUv2. Note that we used the experimental PyTorch implementation of DenseDepth. Note that compared to the original paper, we made a few different modifications:

  • we supervise depth directly instead of supervising disparity
  • we do not use SSIM
  • we use DenseNet161 as encoder instead of DenseNet169

Setup, Training and Evaluation

Please see the NYUv2 directory of this repository for details on how to train and evaluate our method.

📊 Results and 📦 Trained models

Please find below the scores and associated trained models, using dense convolutions to predict wavelet coefficients.

Model name Encoder Resolution abs_rel RMSE δ<1.25 ε_acc Weights Eigen Predictions
Baseline DenseNet 640 x 480 0.1277 0.5479 0.8430 1.7170 Coming soon Coming soon
Ours DenseNet 640 x 480 0.1258 0.5515 0.8451 1.8070 Coming soon Coming soon
Baseline MobileNetv2 640 x 480 0.1772 0.6638 0.7419 1.8911 Coming soon Coming soon
Ours MobileNetv2 640 x 480 0.1727 0.6776 0.7380 1.9732 Coming soon Coming soon

🎚 Playing with sparsity

As with the KITTI dataset, we can tune the wavelet threshold to greatly reduce computation at minimal cost on performance.

sparsify nyu

Computing coefficients at only 5% of the pixels in the decoding process gives a relative depth score loss of less than 0.15%.

scores nyu

🎮 Try it yourself!

Try using our Jupyter notebooks to visualize results with different levels of sparsity, as well as compute the resulting computational saving in FLOPs. Notebooks can be found in <DATASET>/sparsity_test_notebook.ipynb where <DATASET> is either KITTI or NYUv2.

✏️ 📄 Citation

If you find our work useful or interesting, please consider citing our paper:

@inproceedings{ramamonjisoa-2021-wavelet-monodepth,
  title     = {Single Image Depth Prediction with Wavelet Decomposition},
  author    = {Ramamonjisoa, Micha{\"{e}}l and
               Michael Firman and
               Jamie Watson and
               Vincent Lepetit and
               Daniyar Turmukhambetov},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  month = {June},
  year = {2021}
}

👩‍⚖️ License

Copyright © Niantic, Inc. 2021. Patent Pending. All rights reserved. Please see the license file for terms.

Owner
Niantic Labs
Building technologies and ideas that move us
Niantic Labs
Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation Created by Zeyu HU Introduction This work is based on our paper VMNet: Voxel-Mes

HU Zeyu 82 Dec 27, 2022
scikit-learn: machine learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started

scikit-learn 52.5k Jan 08, 2023
Emulation and Feedback Fuzzing of Firmware with Memory Sanitization

BaseSAFE This repository contains the BaseSAFE Rust APIs, introduced by "BaseSAFE: Baseband SAnitized Fuzzing through Emulation". The example/ directo

Security in Telecommunications 138 Dec 16, 2022
small collection of functions for neural networks

neurobiba other languages: RU small collection of functions for neural networks. very easy to use! Installation: pip install neurobiba See examples h

4 Aug 23, 2021
Official Pytorch implementation of "Learning Debiased Representation via Disentangled Feature Augmentation (Neurips 2021, Oral)"

Learning Debiased Representation via Disentangled Feature Augmentation (Neurips 2021, Oral): Official Project Webpage This repository provides the off

Kakao Enterprise Corp. 68 Dec 17, 2022
[ECCV'20] Convolutional Occupancy Networks

Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page | Blog Post This repository contains the implementation o

622 Dec 30, 2022
[CVPR'20] TTSR: Learning Texture Transformer Network for Image Super-Resolution

TTSR Official PyTorch implementation of the paper Learning Texture Transformer Network for Image Super-Resolution accepted in CVPR 2020. Contents Intr

Multimedia Research 689 Dec 28, 2022
Modeling CNN layers activity with Gaussian mixture model

GMM-CNN This code package implements the modeling of CNN layers activity with Gaussian mixture model and Inference Graphs visualization technique from

3 Aug 05, 2022
Little tool in python to watch anime from the terminal (the better way to watch anime)

ani-cli Script working again :), thanks to the fork by Dink4n for the alternative approach to by pass the captcha on gogoanime A cli to browse and wat

Harshith 4.5k Dec 31, 2022
The open-source and free to use Python package miseval was developed to establish a standardized medical image segmentation evaluation procedure

miseval: a metric library for Medical Image Segmentation EVALuation The open-source and free to use Python package miseval was developed to establish

59 Dec 10, 2022
Speed-Test - You can check your intenet speed using this tool

Speed-Test Tool By Hez_X AVAILABLE ON : Termux & Kali linux & Ubuntu (Linux E

Hez-X 3 Feb 17, 2022
Self-labelling via simultaneous clustering and representation learning. (ICLR 2020)

Self-labelling via simultaneous clustering and representation learning 🆗 🆗 🎉 NEW models (20th August 2020): Added standard SeLa pretrained torchvis

Yuki M. Asano 469 Jan 02, 2023
[ICCV 2021 Oral] Mining Latent Classes for Few-shot Segmentation

Mining Latent Classes for Few-shot Segmentation Lihe Yang, Wei Zhuo, Lei Qi, Yinghuan Shi, Yang Gao. This codebase contains baseline of our paper Mini

Lihe Yang 66 Nov 29, 2022
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

hierarchical-transformer-1d Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers In Progress!! 2021.

MyungHoon Jin 7 Nov 06, 2022
Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

Unseen Object Clustering: Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation Introduction In this work, we propose a new method

NVIDIA Research Projects 132 Dec 13, 2022
This repo contains the code required to train the multivariate time-series Transformer.

Multi-Variate Time-Series Transformer This repo contains the code required to train the multivariate time-series Transformer. Download the data The No

Gregory Duthé 4 Nov 24, 2022
STMTrack: Template-free Visual Tracking with Space-time Memory Networks

STMTrack This is the official implementation of the paper: STMTrack: Template-free Visual Tracking with Space-time Memory Networks. Setup Prepare Anac

Zhihong Fu 62 Dec 21, 2022
Evaluation and Benchmarking of Speech Super-resolution Methods

Speech Super-resolution Evaluation and Benchmarking What this repo do: A toolbox for the evaluation of speech super-resolution algorithms. Unify the e

Haohe Liu (刘濠赫) 84 Dec 20, 2022
pytorch bert intent classification and slot filling

pytorch_bert_intent_classification_and_slot_filling 基于pytorch的中文意图识别和槽位填充 说明 基本思路就是:分类+序列标注(命名实体识别)同时训练。 使用的预训练模型:hugging face上的chinese-bert-wwm-ext 依

西西嘛呦 33 Dec 15, 2022
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Adam Goodge 25 Dec 28, 2022