3D cascade RCNN for object detection on point cloud

Last update: Dec 02, 2022

Overview

3D Cascade RCNN

This is the implementation of 3D Cascade RCNN: High Quality Object Detection in Point Clouds.

We designed a 3D object detection model on point clouds by:

Presenting a simple yet effective 3D cascade architecture
Analyzing the sparsity of the point clouds and using point completeness score to re-weighting training samples. Following is detection results on Waymo Open Dataset.

Results on KITTI

	Easy Car	Moderate Car	Hard Car
AP 11	90.05	86.02	79.27
AP 40	93.20	86.19	83.48

Results on Waymo

	Overall Vehicle	0-30m Vehicle	30-50m Vehicle	50m-Inf Vehicle
LEVEL_1 mAP	76.27	92.66	74.99	54.49
LEVEL_2 mAP	67.12	91.95	68.96	41.82

Installation

Requirements. The code is tested on the following environment:

Ubuntu 16.04 with 4 V100 GPUs
Python 3.7
Pytorch 1.7
CUDA 10.1
spconv 1.2.1

Build extensions

python setup.py develop

Getting Started

Prepare for the data.

Please download the official KITTI dataset and generate data infos by following command:

python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/kitti_dataset.yaml

The folder should be like:

data
├── kitti
│   │── ImageSets
│   │── training
│   │   ├──calib & velodyne & label_2 & image_2
│   │── testing
│   │   ├──calib & velodyne & image_2
|   |── kitti_dbinfos_train.pkl
|   |── kitti_infos_train.pkl
|   |── kitti_infos_val.pkl

Training and evaluation.

The configuration file is in tools/cfgs/3d_cascade_rcnn.yaml, and the training scripts is in tools/scripts.

cd tools
sh scripts/3d-cascade-rcnn.sh

Test a pre-trained model

The pre-trained KITTI model is at: model. Run with:

cd tools
sh scripts/3d-cascade-rcnn_test.sh

The evaluation results should be like:

2021-08-10 14:06:14,608   INFO  Car [email protected], 0.70, 0.70:
bbox AP:97.9644, 90.1199, 89.7076
bev  AP:90.6405, 89.0829, 88.4391
3d   AP:90.0468, 86.0168, 79.2661
aos  AP:97.91, 90.00, 89.48
Car [email protected], 0.70, 0.70:
bbox AP:99.1663, 95.8055, 93.3149
bev  AP:96.3107, 92.4128, 89.9473
3d   AP:93.1961, 86.1857, 83.4783
aos  AP:99.13, 95.65, 93.03
Car [email protected], 0.50, 0.50:
bbox AP:97.9644, 90.1199, 89.7076
bev  AP:98.0539, 97.1877, 89.7716
3d   AP:97.9921, 90.1001, 89.7393
aos  AP:97.91, 90.00, 89.48
Car [email protected], 0.50, 0.50:
bbox AP:99.1663, 95.8055, 93.3149
bev  AP:99.1943, 97.8180, 95.5420
3d   AP:99.1717, 95.8046, 95.4500
aos  AP:99.13, 95.65, 93.03

Acknowledge

The code is built on OpenPCDet and Voxel R-CNN.

3D cascade RCNN for object detection on point cloud

Related tags

Overview

3D Cascade RCNN

Results on KITTI

Results on Waymo

Installation

Getting Started

Prepare for the data.

Training and evaluation.

Test a pre-trained model

Acknowledge

Owner

Qi Cai

GMFlow: Learning Optical Flow via Global Matching

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, and Trevor Darrell. CVPR 2015 and PAMI 2016.

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

Combinatorially Hard Games where the levels are procedurally generated

Effect of Deep Transfer and Multi task Learning on Sperm Abnormality Detection

Dynamic View Synthesis from Dynamic Monocular Video

Official implementation for "Symbolic Learning to Optimize: Towards Interpretability and Scalability"

This tutorial aims to learn the basics of deep learning by hands, and master the basics through combination of lectures and exercises

Transformer - Transformer in PyTorch

TensorFlow implementation of "A Simple Baseline for Bayesian Uncertainty in Deep Learning"

Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

A task Provided by A respective Artenal Ai and Ml based Company to complete it

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

Swapping face using Face Mesh with TensorFlow Lite

A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Time Series Cross-Validation -- an extension for scikit-learn

3D cascade RCNN for object detection on point cloud

Related tags

Overview

3D Cascade RCNN

Results on KITTI

Results on Waymo

Installation

Getting Started

Prepare for the data.

Training and evaluation.

Test a pre-trained model

Acknowledge

Owner

Qi Cai

GMFlow: Learning Optical Flow via Global Matching

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long*, Evan Shelhamer*, and Trevor Darrell. CVPR 2015 and PAMI 2016.

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

Combinatorially Hard Games where the levels are procedurally generated

Effect of Deep Transfer and Multi task Learning on Sperm Abnormality Detection

Dynamic View Synthesis from Dynamic Monocular Video

Official implementation for "Symbolic Learning to Optimize: Towards Interpretability and Scalability"

This tutorial aims to learn the basics of deep learning by hands, and master the basics through combination of lectures and exercises

Transformer - Transformer in PyTorch

TensorFlow implementation of "A Simple Baseline for Bayesian Uncertainty in Deep Learning"

Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

A task Provided by A respective Artenal Ai and Ml based Company to complete it

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

Swapping face using Face Mesh with TensorFlow Lite

A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Time Series Cross-Validation -- an extension for scikit-learn

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, and Trevor Darrell. CVPR 2015 and PAMI 2016.