Voxel Transformer for 3D object detection

Last update: Dec 25, 2022

Related tags

Deep Learning VOTR

Overview

Voxel Transformer

This is a reproduced repo of Voxel Transformer for 3D object detection.

The code is mainly based on OpenPCDet.

Introduction

We provide code and training configurations of VoTr-SSD/TSD on the KITTI and Waymo Open dataset. Checkpoints will not be released.

Important Notes: VoTr generally requires quite a long time (more than 60 epochs on Waymo) to converge, and a large GPU memory (32Gb) is needed for reproduction. Please strictly follow the instructions and train with sufficient number of epochs. If you don't have a 32G GPU, you can decrease the attention SIZE parameters in yaml files, but this may possibly harm the performance.

Requirements

The codes are tested in the following environment:

Ubuntu 18.04
Python 3.6
PyTorch 1.5
CUDA 10.1
OpenPCDet v0.3.0
spconv v1.2.1

Installation

a. Clone this repository.

git clone https://github.com/PointsCoder/VOTR.git

b. Install the dependent libraries as follows:

Install the dependent python libraries:

pip install -r requirements.txt

Install the SparseConv library, we use the implementation from [spconv].
- If you use PyTorch 1.1, then make sure you install the spconv v1.0 with (commit 8da6f96) instead of the latest one.
- If you use PyTorch 1.3+, then you need to install the spconv v1.2. As mentioned by the author of spconv, you need to use their docker if you use PyTorch 1.4+.

c. Compile CUDA operators by running the following command:

python setup.py develop

Training

All the models are trained with Tesla V100 GPUs (32G). The KITTI config of votr_ssd is for training with a single GPU. Other configs are for training with 8 GPUs. If you use different number of GPUs for training, it's necessary to change the respective training epochs to attain a decent performance.

The performance of VoTr is quite unstable on KITTI. If you cannnot reproduce the results, remember to run it multiple times.

models

# votr_ssd.yaml: single-stage votr backbone replacing the spconv backbone
# votr_tsd.yaml: two-stage votr with pv-head

training votr_ssd on kitti

CUDA_VISIBLE_DEVICES=0 python train.py --cfg_file cfgs/kitti_models/votr_ssd.yaml

training other models

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 sh scripts/dist_train.sh 8 --cfg_file cfgs/waymo_models/votr_tsd.yaml

testing

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 sh scripts/dist_test.sh 8 --cfg_file cfgs/waymo_models/votr_tsd.yaml --eval_all

Citation

If you find this project useful in your research, please consider cite:

@article{mao2021voxel,
  title={Voxel Transformer for 3D Object Detection},
  author={Mao, Jiageng and Xue, Yujing and Niu, Minzhe and others},
  journal={ICCV},
  year={2021}
}

Voxel Transformer for 3D object detection

Related tags

Overview

Voxel Transformer

Introduction

Requirements

Installation

Training

Citation

Owner

Official PyTorch Implementation of HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning (NeurIPS 2021 Spotlight)

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

code associated with ACL 2021 DExperts paper

Machine learning algorithms for many-body quantum systems

AAAI-22 paper: SimSR: Simple Distance-based State Representationfor Deep Reinforcement Learning

Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"

This app is a simple example of using Strealit to create a financial data web app.

PyTorch implementation of paper "IBRNet: Learning Multi-View Image-Based Rendering", CVPR 2021.

Simulated garment dataset for virtual try-on

Second-order Attention Network for Single Image Super-resolution (CVPR-2019)

Reference implementation for Structured Prediction with Deep Value Networks

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

My implementation of DeepMind's Perceiver

Implementation of paper "Graph Condensation for Graph Neural Networks"

Code repository for the paper: Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild (ICCV 2021)

nfelo: a power ranking, prediction, and betting model for the NFL

Implementation of the paper "Fine-Tuning Transformers: Vocabulary Transfer"

This repository contains implementations of all Machine Learning Algorithms from scratch in Python. Mathematics required for ML and many projects have also been included.