Improving 3D Object Detection with Channel-wise Transformer

Last update: Dec 20, 2022

Related tags

Overview

"Improving 3D Object Detection with Channel-wise Transformer"

Thanks for the OpenPCDet, this implementation of the CT3D is mainly based on the pcdet v0.3. Our paper can be downloaded here ICCV2021.

Overview of CT3D. The raw points are first fed into the RPN for generating 3D proposals. Then the raw points along with the corresponding proposals are processed by the channel-wise Transformer composed of the proposal-to-point encoding module and the channel-wise decoding module. Specifically, the proposal-to-point encoding module is to modulate each point feature with global proposal-aware context information. After that, the encoded point features are transformed into an effective proposal feature representation by the channel-wise decoding module for confidence prediction and box regression.

	[email protected]	[email protected]	Download
Only Car	86.06	85.79	model-car
3-Category (Car)	85.04	84.97	model-3cat
3-Category (Pedestrian)	56.28	55.58	-
3-Category (Cyclist)	71.71	71.88	-

1. Recommended Environment

Linux (tested on Ubuntu 16.04)
Python 3.6+
PyTorch 1.1 or higher (tested on PyTorch 1.6)
CUDA 9.0 or higher (PyTorch 1.3+ needs CUDA 9.2+)

2. Set the Environment

pip install -r requirement.txt
python setup.py develop

3. Data Preparation

Prepare KITTI dataset and road planes

# Download KITTI and organize it into the following form:
├── data
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & (optional: planes)
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2

# Generatedata infos:
python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml

Prepare Waymo dataset

# Download Waymo and organize it into the following form:
├── data
│   ├── waymo
│   │   │── ImageSets
│   │   │── raw_data
│   │   │   │── segment-xxxxxxxx.tfrecord
|   |   |   |── ...
|   |   |── waymo_processed_data
│   │   │   │── segment-xxxxxxxx/
|   |   |   |── ...
│   │   │── pcdet_gt_database_train_sampled_xx/
│   │   │── pcdet_waymo_dbinfos_train_sampled_xx.pkl

# Install tf 2.1.0
# Install the official waymo-open-dataset by running the following command:
pip3 install --upgrade pip
pip3 install waymo-open-dataset-tf-2-1-0 --user

# Extract point cloud data from tfrecord and generate data infos:
python -m pcdet.datasets.waymo.waymo_dataset --func create_waymo_infos --cfg_file tools/cfgs/dataset_configs/waymo_dataset.yaml

4. Train

Train with a single GPU

python train.py --cfg_file ${CONFIG_FILE}

# e.g.,
python train.py --cfg_file tools/cfgs/kitti_models/second_ct3d.yaml

Train with multiple GPUs or multiple machines

bash scripts/dist_train.sh ${NUM_GPUS} --cfg_file ${CONFIG_FILE}
# or 
bash scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} ${NUM_GPUS} --cfg_file ${CONFIG_FILE}

# e.g.,
bash scripts/dist_train.sh 8 --cfg_file tools/cfgs/kitti_models/second_ct3d.yaml

5. Test

Test with a pretrained model:

python test.py --cfg_file ${CONFIG_FILE} --ckpt ${CKPT}

# e.g., 
python test.py --cfg_file tools/cfgs/kitti_models/second_ct3d.yaml --ckpt output/kitti_models/second_ct3d/default/kitti_val.pth

Improving 3D Object Detection with Channel-wise Transformer

Related tags

Overview

"Improving 3D Object Detection with Channel-wise Transformer"

1. Recommended Environment

2. Set the Environment

3. Data Preparation

4. Train

5. Test

Owner

Hualian Sheng

Open source repository for the code accompanying the paper 'PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations'.

Implementation of SwinTransformerV2 in TensorFlow.

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view.

Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

Code for paper: Towards Tokenized Human Dynamics Representation

This is the repository for The Machine Learning Workshops, published by AI DOJO

Multi-resolution SeqMatch based long-term Place Recognition

Pre-training of Graph Augmented Transformers for Medication Recommendation

Framework to build and train RL algorithms

Using machine learning to predict undergrad college admissions.

Scalable implementation of Lee / Mykland (2012) and Ait-Sahalia / Jacod (2012) Jump tests for noisy high frequency data

Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GanFormer and TransGan paper

[ICCV 2021] Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain

A novel framework to automatically learn high-quality scanning of non-planar, complex anisotropic appearance.

Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

Certified Patch Robustness via Smoothed Vision Transformers

Spherical Confidence Learning for Face Recognition, accepted to CVPR2021.

Code for the Weighted, Accelerated and Restarted Primal-dual algorithm. This algorithm achieves stable linear convergence for reconstruction from undersampled noisy measurements under an approximate sharpness condition. See the paper for details.

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks