ICRA 2021 "Towards Precise and Efficient Image Guided Depth Completion"

Last update: Dec 25, 2022

Related tags

Overview

PENet: Precise and Efficient Depth Completion

This repo is the PyTorch implementation of our paper to appear in ICRA2021 on "Towards Precise and Efficient Image Guided Depth Completion", developed by Mu Hu, Shuling Wang, Bin Li, Shiyu Ning, Li Fan, and Xiaojin Gong at Zhejiang University and Huawei Shanghai.

Create a new issue for any code-related questions. Feel free to direct me as well at [email protected] for any paper-related questions.

Results

The proposed full model ranks 1st in the KITTI depth completion online leaderboard at the time of submission.
It infers much faster than most of the top ranked methods.

Both ENet and PENet can be trained thoroughly on 2x11G GPU.
Our network is trained with the KITTI dataset alone, not pretrained on Cityscapes or other similar driving dataset (either synthetic or real).

Method

A Strong Two-branch Backbone

Revisiting the popular two-branch architecture

The two-branch backbone is designed to thoroughly exploit color-dominant and depth-dominant information from their respective branches and make the fusion of two modalities effective. Note that it is the depth prediction result obtained from the color-dominant branch that is input to the depth-dominant branch, not a guidance map like those in DeepLiDAR and FusionNet.

Geometric convolutional Layer

To encode 3D geometric information, it simply augments a conventional convolutional layer via concatenating a 3D position map to the layer’s input.

Dilated and Accelerated CSPN++

Dilated CSPN

we introduce a dilation strategy similar to the well known dilated convolutions to enlarge the propagation neighborhoods.

Accelerated CSPN

we design an implementation that makes the propagation from each neighbor truly parallel, which greatly accelerates the propagation procedure.

Dependency
Data
Trained Models
Commands
Citation

Dependency

Our released implementation is tested on.

Ubuntu 16.04
Python 3.7.4 (Anaconda 2019.10)
PyTorch 1.3.1 / torchvision 0.4.2
NVIDIA CUDA 10.0.130
4x NVIDIA GTX 2080 Ti GPUs

pip install numpy matplotlib Pillow
pip install scikit-image
pip install opencv-contrib-python==3.4.2.17

Data

Download the KITTI Depth Dataset and KITTI Raw Dataset from their websites. The overall data directory is structured as follows:

├── kitti_depth
|   ├── depth
|   |   ├──data_depth_annotated
|   |   |  ├── train
|   |   |  ├── val
|   |   ├── data_depth_velodyne
|   |   |  ├── train
|   |   |  ├── val
|   |   ├── data_depth_selection
|   |   |  ├── test_depth_completion_anonymous
|   |   |  |── test_depth_prediction_anonymous
|   |   |  ├── val_selection_cropped

├── kitti_raw
|   ├── 2011_09_26
|   ├── 2011_09_28
|   ├── 2011_09_29
|   ├── 2011_09_30
|   ├── 2011_10_03

Trained Models

Download our pre-trained models:

PENet (i.e., the proposed full model with dilation_rate=2): Download Here
ENet (i.e., the backbone): Download Here

Commands

A complete list of training options is available with

python main.py -h

Training

Here we adopt a multi-stage training strategy to train the backbone, DA-CSPN++, and the full model progressively. However, end-to-end training is feasible as well.

Train ENet (Part Ⅰ)

CUDA_VISIBLE_DEVICES="0,1" python main.py -b 6 -n e
# -b for batch size
# -n for network model

Train DA-CSPN++ (Part Ⅱ)

CUDA_VISIBLE_DEVICES="0,1" python main.py -b 6 -f -n pe --resume [enet-checkpoint-path]
# -f for freezing the parameters in the backbone
# --resume for initializing the parameters from the checkpoint

Train PENet (Part Ⅲ)

CUDA_VISIBLE_DEVICES="0,1" python main.py -b 10 -n pe -he 160 -w 576 --resume [penet-checkpoint-path]
# -he, -w for the image size after random cropping

Evalution

CUDA_VISIBLE_DEVICES="0" python main.py -b 1 -n p --evaluate [enet-checkpoint-path]
CUDA_VISIBLE_DEVICES="0" python main.py -b 1 -n pe --evaluate [penet-checkpoint-path]
# test the trained model on the val_selection_cropped data

Test

CUDA_VISIBLE_DEVICES="0" python main.py -b 1 -n pe --evaluate [penet-checkpoint-path] --test
# generate and save results of the trained model on the test_depth_completion_anonymous data

Citation

If you use our code or method in your work, please cite the following:

@article{hu2020PENet,
	title={Towards Precise and Efficient Image Guided Depth Completion},
	author={Hu, Mu and Wang, Shuling and Li, Bin and Ning, Shiyu and Fan, Li and Gong, Xiaojin},
	booktitle={ICRA},
	year={2021}
}

Related Repositories

The original code framework is rendered from "Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera". It is developed by Fangchang Ma, Guilherme Venturelli Cavalheiro, and Sertac Karaman at MIT.

The part of CoordConv is rendered from "An intriguing failing of convolutional neural networks and the CoordConv".

ICRA 2021 "Towards Precise and Efficient Image Guided Depth Completion"

Related tags

Overview

PENet: Precise and Efficient Depth Completion

Results

Method

A Strong Two-branch Backbone

Revisiting the popular two-branch architecture

Geometric convolutional Layer

Dilated and Accelerated CSPN++

Dilated CSPN

Accelerated CSPN

Contents

Dependency

Data

Trained Models

Commands

Training

Evalution

Test

Citation

Related Repositories

Owner

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

JORLDY an open-source Reinforcement Learning (RL) framework provided by KakaoEnterprise

P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks

Active learning for Mask R-CNN in Detectron2

MINOS: Multimodal Indoor Simulator

This is the offical website for paper ''Category-consistent deep network learning for accurate vehicle logo recognition''

Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

A hybrid SOTA solution of LiDAR panoptic segmentation with C++ implementations of point cloud clustering algorithms. ICCV21, Workshop on Traditional Computer Vision in the Age of Deep Learning

Implementation of "Deep Implicit Templates for 3D Shape Representation"

Mask-invariant Face Recognition through Template-level Knowledge Distillation

Self-supervised Label Augmentation via Input Transformations (ICML 2020)

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

Implementation of light baking system for ray tracing based on Activision's UberBake

QuadTree Attention for Vision Transformers (ICLR2022)

Unofficial implementation of Pix2SEQ

Unsupervised clustering of high content screen samples

[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

Learning from Synthetic Data with Fine-grained Attributes for Person Re-Identification

Accommodating supervised learning algorithms for the historical prices of the world's favorite cryptocurrency and boosting it through LightGBM.

FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.