Group-Free 3D Object Detection via Transformers

Last update: Dec 07, 2022

Related tags

Deep Learning Group-Free-3D

Overview

Group-Free 3D Object Detection via Transformers

By Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong.

This repo is the official implementation of "Group-Free 3D Object Detection via Transformers".

Updates

April 01, 2021: initial release.

Introduction

Recently, directly detecting 3D objects from 3D point clouds has received increasing attention. To extract object representation from an irregular point cloud, existing methods usually take a point grouping step to assign the points to an object candidate so that a PointNet-like network could be used to derive object features from the grouped points. However, the inaccurate point assignments caused by the hand-crafted grouping scheme decrease the performance of 3D object detection. In this paper, we present a simple yet effective method for directly detecting 3D objects from the 3D point cloud. Instead of grouping local points to each object candidate, our method computes the feature of an object from all the points in the point cloud with the help of an attention mechanism in the Transformers, where the contribution of each point is automatically learned in the network training. With an improved attention stacking scheme, our method fuses object features in different stages and generates more accurate object detection results. With few bells and whistles, the proposed method achieves state-of-the-art 3D object detection performance on two widely used benchmarks, ScanNet V2 and SUN RGB-D.

In this repository, we provide model implementation (with Pytorch) as well as data preparation, training and evaluation scripts on ScanNet and SUN RGB-D.

Citation

@article{liu2021,
  title={Group-Free 3D Object Detection via Transformers},
  author={Liu, Ze and Zhang, Zheng and Cao, Yue and Hu, Han and Tong, Xin},
  journal={arXiv preprint arXiv:2104.00678},
  year={2021}
}

Main Results

ScanNet V2

Method	backbone	[email protected]	[email protected]	Model
HGNet	GU-net	61.3	34.4	-
GSDN	MinkNet	62.8	34.8	waiting for release
3D-MPA	MinkNet	64.2	49.2	waiting for release
VoteNet	PointNet++	62.9	39.9	official repo
MLCVNet	PointNet++	64.5	41.4	official repo
H3DNet	PointNet++	64.4	43.4	official repo
H3DNet	4xPointNet++	67.2	48.1	official repo
Ours(L6, O256)	PointNet++	67.3 (66.2*)	48.9 (48.4*)	model
Ours(L12, O256)	PointNet++	67.2 (66.6*)	49.7 (49.3*)	model
Ours(L12, O256)	PointNet++w2×	68.8 (68.3*)	52.1 (51.1*)	model
Ours(L12, O512)	PointNet++w2×	69.1 (68.8*)	52.8 (52.3*)	model

SUN RGB-D

Method	backbone	inputs	[email protected]	[email protected]	Model
VoteNet	PointNet++	point	59.1	35.8	official repo
MLCVNet	PointNet++	point	59.8	-	official repo
HGNet	GU-net	point	61.6	-	-
H3DNet	4xPointNet++	point	60.1	39.0	official repo
imVoteNet	PointNet++	point+RGB	63.4	-	official repo
Ours(L6, O256)	PointNet++	point	62.8 (62.6*)	42.3 (42.0*)	model

Notes:

* means the result is averaged over 5-times evaluation since the algorithm randomness is large.

Install

Requirements

Ubuntu 16.04
Anaconda with python=3.6
pytorch>=1.3
torchvision with pillow<7
cuda=10.1
trimesh>=2.35.39,<2.35.40
'networkx>=2.2,<2.3'
compile the CUDA layers for PointNet++, which we used in the backbone network: sh init.sh
others: pip install termcolor opencv-python tensorboard

Data preparation

For SUN RGB-D, follow the README under the sunrgbd folder.

For ScanNet, follow the README under the scannet folder.

Usage

ScanNet

For L6, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --num_decoder_layers 6 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 50000 --num_decoder_layers 6 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For L12, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --num_decoder_layers 12 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 50000 --num_decoder_layers 12 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For w2x, L12, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For w2x, L12, O256 evaluation:

python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For w2x, L12, O512 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For w2x, L12, O512 evaluation:

python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

SUN RGB-D

For L6, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 \
    --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 \
    --learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 \
    --dataset sunrgbd --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 20000 --num_decoder_layers 6 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset sunrgbd --data_root <data directory> [--dump_dir <dump directory>]

Acknowledgements

We thank a lot for the flexible codebase of votenet.

License

The code is released under MIT License (see LICENSE file for details).

Group-Free 3D Object Detection via Transformers

Related tags

Overview

Group-Free 3D Object Detection via Transformers

Introduction

Citation

Main Results

ScanNet V2

SUN RGB-D

Install

Requirements

Data preparation

Usage

ScanNet

SUN RGB-D

Acknowledgements

License

Owner

Ze Liu

This is a collection of all challenges in HKCERT CTF 2021

Conversational text Analysis using various NLP techniques

An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks

The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

Collective Multi-type Entity Alignment Between Knowledge Graphs (WWW'20)

EM-POSE 3D Human Pose Estimation from Sparse Electromagnetic Trackers.

Causal-Adversarial-Instruments - PyTorch Implementation for Developing Library of Investigating Adversarial Examples on A Causal View by Instruments

[NeurIPS2021] Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

A DeepStack custom model for detecting common objects in dark/night images and videos.

PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)

Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

A code implementation of AC-GC: Activation Compression with Guaranteed Convergence, in NeurIPS 2021.

Active learning for Mask R-CNN in Detectron2

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Fully Connected DenseNet for Image Segmentation

A `Neural = Symbolic` framework for sound and complete weighted real-value logic

Implementation of Google Brain's WaveGrad high-fidelity vocoder

[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

Streaming over lightweight data transformations

Here is the diagnostic tool for BMVC 2021 paper Diagnosing Errors in Video Relation Detectors.