Group-Free 3D Object Detection via Transformers

Overview

Group-Free 3D Object Detection via Transformers

By Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong.

This repo is the official implementation of "Group-Free 3D Object Detection via Transformers".

teaser

Updates

  • April 01, 2021: initial release.

Introduction

Recently, directly detecting 3D objects from 3D point clouds has received increasing attention. To extract object representation from an irregular point cloud, existing methods usually take a point grouping step to assign the points to an object candidate so that a PointNet-like network could be used to derive object features from the grouped points. However, the inaccurate point assignments caused by the hand-crafted grouping scheme decrease the performance of 3D object detection. In this paper, we present a simple yet effective method for directly detecting 3D objects from the 3D point cloud. Instead of grouping local points to each object candidate, our method computes the feature of an object from all the points in the point cloud with the help of an attention mechanism in the Transformers, where the contribution of each point is automatically learned in the network training. With an improved attention stacking scheme, our method fuses object features in different stages and generates more accurate object detection results. With few bells and whistles, the proposed method achieves state-of-the-art 3D object detection performance on two widely used benchmarks, ScanNet V2 and SUN RGB-D.

In this repository, we provide model implementation (with Pytorch) as well as data preparation, training and evaluation scripts on ScanNet and SUN RGB-D.

Citation

@article{liu2021,
  title={Group-Free 3D Object Detection via Transformers},
  author={Liu, Ze and Zhang, Zheng and Cao, Yue and Hu, Han and Tong, Xin},
  journal={arXiv preprint arXiv:2104.00678},
  year={2021}
}

Main Results

ScanNet V2

Method backbone [email protected] [email protected] Model
HGNet GU-net 61.3 34.4 -
GSDN MinkNet 62.8 34.8 waiting for release
3D-MPA MinkNet 64.2 49.2 waiting for release
VoteNet PointNet++ 62.9 39.9 official repo
MLCVNet PointNet++ 64.5 41.4 official repo
H3DNet PointNet++ 64.4 43.4 official repo
H3DNet 4xPointNet++ 67.2 48.1 official repo
Ours(L6, O256) PointNet++ 67.3 (66.2*) 48.9 (48.4*) model
Ours(L12, O256) PointNet++ 67.2 (66.6*) 49.7 (49.3*) model
Ours(L12, O256) PointNet++w2× 68.8 (68.3*) 52.1 (51.1*) model
Ours(L12, O512) PointNet++w2× 69.1 (68.8*) 52.8 (52.3*) model

SUN RGB-D

Method backbone inputs [email protected] [email protected] Model
VoteNet PointNet++ point 59.1 35.8 official repo
MLCVNet PointNet++ point 59.8 - official repo
HGNet GU-net point 61.6 - -
H3DNet 4xPointNet++ point 60.1 39.0 official repo
imVoteNet PointNet++ point+RGB 63.4 - official repo
Ours(L6, O256) PointNet++ point 62.8 (62.6*) 42.3 (42.0*) model

Notes:

  • * means the result is averaged over 5-times evaluation since the algorithm randomness is large.

Install

Requirements

  • Ubuntu 16.04
  • Anaconda with python=3.6
  • pytorch>=1.3
  • torchvision with pillow<7
  • cuda=10.1
  • trimesh>=2.35.39,<2.35.40
  • 'networkx>=2.2,<2.3'
  • compile the CUDA layers for PointNet++, which we used in the backbone network: sh init.sh
  • others: pip install termcolor opencv-python tensorboard

Data preparation

For SUN RGB-D, follow the README under the sunrgbd folder.

For ScanNet, follow the README under the scannet folder.

Usage

ScanNet

For L6, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --num_decoder_layers 6 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 50000 --num_decoder_layers 6 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For L12, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --num_decoder_layers 12 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 50000 --num_decoder_layers 12 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For w2x, L12, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For w2x, L12, O256 evaluation:

python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For w2x, L12, O512 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For w2x, L12, O512 evaluation:

python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

SUN RGB-D

For L6, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 \
    --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 \
    --learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 \
    --dataset sunrgbd --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 20000 --num_decoder_layers 6 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset sunrgbd --data_root <data directory> [--dump_dir <dump directory>]

Acknowledgements

We thank a lot for the flexible codebase of votenet.

License

The code is released under MIT License (see LICENSE file for details).

Owner
Ze Liu
USTC & MSRA Joint-PhD candidate.
Ze Liu
Weakly Supervised Scene Text Detection using Deep Reinforcement Learning

Weakly Supervised Scene Text Detection using Deep Reinforcement Learning This repository contains the setup for all experiments performed in our Paper

Emanuel Metzenthin 3 Dec 16, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2020 Links Doc

Sebastian Raschka 4.2k Jan 02, 2023
This is the official implementation for the paper "Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization" in NeurIPS 2021.

MPMAB_BEACON This is code used for the paper "Decentralized Multi-player Multi-armed Bandits: Beyond Linear Reward Functions", Neurips 2021. Requireme

Cong Shen Research Group 0 Oct 26, 2021
Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019

Class-Balanced Loss Based on Effective Number of Samples Tensorflow code for the paper: Class-Balanced Loss Based on Effective Number of Samples Yin C

Yin Cui 546 Jan 08, 2023
PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020

PERIN: Permutation-invariant Semantic Parsing David Samuel & Milan Straka Charles University Faculty of Mathematics and Physics Institute of Formal an

ÚFAL 40 Jan 04, 2023
Open source repository for the code accompanying the paper 'PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations'.

PatchNets This is the official repository for the project "PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations". For details,

16 May 22, 2022
Pytorch implementation of MalConv

MalConv-Pytorch A Pytorch implementation of MalConv Desciprtion This is the implementation of MalConv proposed in Malware Detection by Eating a Whole

Alexander H. Liu 58 Oct 26, 2022
Implementation of ICCV 2021 oral paper -- A Novel Self-Supervised Learning for Gaussian Mixture Model

SS-GMM Implementation of ICCV 2021 oral paper -- Self-Supervised Image Prior Learning with GMM from a Single Noisy Image with supplementary material R

HUST-The Tan Lab 4 Dec 05, 2022
Learning Facial Representations from the Cycle-consistency of Face (ICCV 2021)

Learning Facial Representations from the Cycle-consistency of Face (ICCV 2021) This repository contains the code for our ICCV2021 paper by Jia-Ren Cha

Jia-Ren Chang 40 Dec 27, 2022
LibMTL: A PyTorch Library for Multi-Task Learning

LibMTL LibMTL is an open-source library built on PyTorch for Multi-Task Learning (MTL). See the latest documentation for detailed introductions and AP

765 Jan 06, 2023
Implementations for the ICLR-2021 paper: SEED: Self-supervised Distillation For Visual Representation.

Implementations for the ICLR-2021 paper: SEED: Self-supervised Distillation For Visual Representation.

Jacob 27 Oct 23, 2022
Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

openpifpaf Continuously tested on Linux, MacOS and Windows: New 2021 paper: OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Te

VITA lab at EPFL 50 Dec 29, 2022
Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

Jennefer Maldonado 1 Dec 28, 2021
A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

Eugenio Herrera 175 Dec 29, 2022
Official PyTorch(Geometric) implementation of DPGNN(DPGCN) in "Distance-wise Prototypical Graph Neural Network for Node Imbalance Classification"

DPGNN This repository is an official PyTorch(Geometric) implementation of DPGNN(DPGCN) in "Distance-wise Prototypical Graph Neural Network for Node Im

Yu Wang (Jack) 18 Oct 12, 2022
AirCode: A Robust Object Encoding Method

AirCode This repo contains source codes for the arXiv preprint "AirCode: A Robust Object Encoding Method" Demo Object matching comparison when the obj

Chen Wang 30 Dec 09, 2022
DiffStride: Learning strides in convolutional neural networks

DiffStride is a pooling layer with learnable strides. Unlike strided convolutions, average pooling or max-pooling that require cross-validating stride values at each layer, DiffStride can be initiali

Google Research 113 Dec 13, 2022
The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting

About The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting The demo program was only tested under Conda in a standard

Anh-Dzung Doan 5 Nov 28, 2022
Video Frame Interpolation with Transformer (CVPR2022)

VFIformer Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer Dependencies python = 3.8 pytorch = 1.8.0

DV Lab 63 Dec 16, 2022
Deep metric learning methods implemented in Chainer

Deep Metric Learning Implementation of several methods for deep metric learning in Chainer v4.2.0. Proxy-NCA: No Fuss Distance Metric Learning using P

ronekko 156 Nov 28, 2022