[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Last update: Dec 30, 2022

Related tags

Overview

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

This is the official implementation for the method described in

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Jiaxing Yan, Hong Zhao, Penghui Bu and YuSheng Jin.

3DV 2021 (arXiv pdf)

Setup

Assuming a fresh Anaconda distribution, you can install the dependencies with:

conda install pytorch=1.7.0 torchvision=0.8.1 -c pytorch
pip install tensorboardX==2.1
pip install opencv-python==3.4.7.28
pip install albumentations==0.5.2   # we use albumentations for faster image preprocessing

This project uses Python 3.7.8, cuda 11.4, the experiments were conducted using a single NVIDIA RTX 3090 GPU and CPU environment - Intel Core i9-9900KF.

We recommend using a conda environment to avoid dependency conflicts.

Prediction for a single image

You can predict scaled disparity for a single image with:

python test_simple.py --image_path images/test_image.jpg --model_name MS_1024x320

On its first run either of these commands will download the MS_1024x320 pretrained model (272MB) into the models/ folder. We provide the following options for --model_name:

`--model_name`	Training modality	Resolution	Abs_Rel	Sq_Rel	$\delta<1.25$
`M_640x192`	Mono	640 x 192	0.105	0.769	0.892
`M_1024x320`	Mono	1024 x 320	0.102	0.734	0.898
`M_1280x384`	Mono	1280 x 384	0.102	0.715	0.900
`MS_640x192`	Mono + Stereo	640 x 192	0.102	0.752	0.894
`MS_1024x320`	Mono + Stereo	1024 x 320	0.096	0.694	0.908

KITTI training data

You can download the entire raw KITTI dataset by running:

wget -i splits/kitti_archives_to_download.txt -P kitti_data/

Then unzip with

cd kitti_data
unzip "*.zip"
cd ..

Splits

The train/test/validation splits are defined in the splits/ folder. By default, the code will train a depth model using Zhou's subset of the standard Eigen split of KITTI, which is designed for monocular training. You can also train a model using the new benchmark split or the odometry split by setting the --split flag.

Training

Monocular training:

python train.py --model_name mono_model

Stereo training:

Our code defaults to using Zhou's subsampled Eigen training data. For stereo-only training we have to specify that we want to use the full Eigen training set.

python train.py --model_name stereo_model \
  --frame_ids 0 --use_stereo --split eigen_full

Monocular + stereo training:

python train.py --model_name mono+stereo_model \
  --frame_ids 0 -1 1 --use_stereo

Note: For high resolution input, e.g. 1024x320 and 1280x384, we employ a lightweight setup, ResNet18 and 640x192, for pose encoder at training for memory savings. The following example command trains a model named M_1024x320:

python train.py --model_name M_1024x320 --num_layers 50 --height 320 --width 1024 --num_layers_pose 18 --height_pose 192 --width_pose 640
#             encoder     resolution                                     
# DepthNet   resnet50      1024x320
# PoseNet    resnet18       640x192

Finetuning a pretrained model

Add the following to the training command to load an existing model for finetuning:

python train.py --model_name finetuned_mono --load_weights_folder ~/tmp/mono_model/models/weights_19

Other training options

Run python train.py -h (or look at options.py) to see the range of other training options, such as learning rates and ablation settings.

KITTI evaluation

To prepare the ground truth depth maps run:

python export_gt_depth.py --data_path kitti_data --split eigen
python export_gt_depth.py --data_path kitti_data --split eigen_benchmark

...assuming that you have placed the KITTI dataset in the default location of ./kitti_data/.

The following example command evaluates the weights of a model named MS_1024x320:

python evaluate_depth.py --load_weights_folder ./log/MS_1024x320 --eval_mono --data_path ./kitti_data --eval_split eigen

Precomputed results

You can download our precomputed disparity predictions from the following links:

Training modality	Input size	`.npy` filesize	Eigen disparities
Mono	640 x 192	326M	Download 🔗
Mono	1024 x 320	871M	Download 🔗
Mono	1280 x 384	1.27G	Download 🔗
Mono + Stereo	640 x 192	326M	Download 🔗
Mono + Stereo	1024 x 320	871M	Download 🔗

References

Monodepth2 - https://github.com/nianticlabs/monodepth2

[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Related tags

Overview

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Setup

Prediction for a single image

KITTI training data

Training

Finetuning a pretrained model

Other training options

KITTI evaluation

Precomputed results

References

Owner

Jiaxing Yan

diablo2 resurrected loot filter

MERLOT: Multimodal Neural Script Knowledge Models

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

A distributed deep learning framework that supports flexible parallelization strategies.

OpenPCDet Toolbox for LiDAR-based 3D Object Detection.

External Attention Network

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces

Barlow Twins and HSIC

Create images and texts with the First Order Generative Adversarial Networks

D2Go is a toolkit for efficient deep learning

SOTA model in CIFAR10

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

A library that allows for inference on probabilistic models

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

Walk with fastai

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Explicable Reward Design for Reinforcement Learning Agents [NeurIPS'21]

Global-Local Attention for Emotion Recognition