3D cascade RCNN for object detection on point cloud

Last update: Dec 02, 2022

Overview

3D Cascade RCNN

This is the implementation of 3D Cascade RCNN: High Quality Object Detection in Point Clouds.

We designed a 3D object detection model on point clouds by:

Presenting a simple yet effective 3D cascade architecture
Analyzing the sparsity of the point clouds and using point completeness score to re-weighting training samples. Following is detection results on Waymo Open Dataset.

Results on KITTI

	Easy Car	Moderate Car	Hard Car
AP 11	90.05	86.02	79.27
AP 40	93.20	86.19	83.48

Results on Waymo

	Overall Vehicle	0-30m Vehicle	30-50m Vehicle	50m-Inf Vehicle
LEVEL_1 mAP	76.27	92.66	74.99	54.49
LEVEL_2 mAP	67.12	91.95	68.96	41.82

Installation

Requirements. The code is tested on the following environment:

Ubuntu 16.04 with 4 V100 GPUs
Python 3.7
Pytorch 1.7
CUDA 10.1
spconv 1.2.1

Build extensions

python setup.py develop

Getting Started

Prepare for the data.

Please download the official KITTI dataset and generate data infos by following command:

python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/kitti_dataset.yaml

The folder should be like:

data
├── kitti
│   │── ImageSets
│   │── training
│   │   ├──calib & velodyne & label_2 & image_2
│   │── testing
│   │   ├──calib & velodyne & image_2
|   |── kitti_dbinfos_train.pkl
|   |── kitti_infos_train.pkl
|   |── kitti_infos_val.pkl

Training and evaluation.

The configuration file is in tools/cfgs/3d_cascade_rcnn.yaml, and the training scripts is in tools/scripts.

cd tools
sh scripts/3d-cascade-rcnn.sh

Test a pre-trained model

The pre-trained KITTI model is at: model. Run with:

cd tools
sh scripts/3d-cascade-rcnn_test.sh

The evaluation results should be like:

2021-08-10 14:06:14,608   INFO  Car [email protected], 0.70, 0.70:
bbox AP:97.9644, 90.1199, 89.7076
bev  AP:90.6405, 89.0829, 88.4391
3d   AP:90.0468, 86.0168, 79.2661
aos  AP:97.91, 90.00, 89.48
Car [email protected], 0.70, 0.70:
bbox AP:99.1663, 95.8055, 93.3149
bev  AP:96.3107, 92.4128, 89.9473
3d   AP:93.1961, 86.1857, 83.4783
aos  AP:99.13, 95.65, 93.03
Car [email protected], 0.50, 0.50:
bbox AP:97.9644, 90.1199, 89.7076
bev  AP:98.0539, 97.1877, 89.7716
3d   AP:97.9921, 90.1001, 89.7393
aos  AP:97.91, 90.00, 89.48
Car [email protected], 0.50, 0.50:
bbox AP:99.1663, 95.8055, 93.3149
bev  AP:99.1943, 97.8180, 95.5420
3d   AP:99.1717, 95.8046, 95.4500
aos  AP:99.13, 95.65, 93.03

Acknowledge

The code is built on OpenPCDet and Voxel R-CNN.

3D cascade RCNN for object detection on point cloud

Related tags

Overview

3D Cascade RCNN

Results on KITTI

Results on Waymo

Installation

Getting Started

Prepare for the data.

Training and evaluation.

Test a pre-trained model

Acknowledge

Owner

Qi Cai

[TNNLS 2021] The official code for the paper "Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement"

PyTorch implementation for our paper "Deep Facial Synthesis: A New Challenge"

Model search is a framework that implements AutoML algorithms for model architecture search at scale

Configure SRX interfaces with Scrapli

NICE-GAN — Official PyTorch Implementation Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

🐦 Quickly annotate data from the comfort of your Jupyter notebook

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks

Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision

Python implementation of a live deep learning based age/gender/expression recognizer

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

Speed-Test - You can check your intenet speed using this tool

3D position tracking for soccer players with multi-camera videos

Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021)

[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

FLVIS: Feedback Loop Based Visual Initial SLAM

Solver for Large-Scale Rank-One Semidefinite Relaxations

The codebase for Data-driven general-purpose voice activity detection.

Robust & Reliable Route Recommendation on Road Networks