Group-Free 3D Object Detection via Transformers

Overview

Group-Free 3D Object Detection via Transformers

By Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong.

This repo is the official implementation of "Group-Free 3D Object Detection via Transformers".

teaser

Updates

  • April 01, 2021: initial release.

Introduction

Recently, directly detecting 3D objects from 3D point clouds has received increasing attention. To extract object representation from an irregular point cloud, existing methods usually take a point grouping step to assign the points to an object candidate so that a PointNet-like network could be used to derive object features from the grouped points. However, the inaccurate point assignments caused by the hand-crafted grouping scheme decrease the performance of 3D object detection. In this paper, we present a simple yet effective method for directly detecting 3D objects from the 3D point cloud. Instead of grouping local points to each object candidate, our method computes the feature of an object from all the points in the point cloud with the help of an attention mechanism in the Transformers, where the contribution of each point is automatically learned in the network training. With an improved attention stacking scheme, our method fuses object features in different stages and generates more accurate object detection results. With few bells and whistles, the proposed method achieves state-of-the-art 3D object detection performance on two widely used benchmarks, ScanNet V2 and SUN RGB-D.

In this repository, we provide model implementation (with Pytorch) as well as data preparation, training and evaluation scripts on ScanNet and SUN RGB-D.

Citation

@article{liu2021,
  title={Group-Free 3D Object Detection via Transformers},
  author={Liu, Ze and Zhang, Zheng and Cao, Yue and Hu, Han and Tong, Xin},
  journal={arXiv preprint arXiv:2104.00678},
  year={2021}
}

Main Results

ScanNet V2

Method backbone [email protected] [email protected] Model
HGNet GU-net 61.3 34.4 -
GSDN MinkNet 62.8 34.8 waiting for release
3D-MPA MinkNet 64.2 49.2 waiting for release
VoteNet PointNet++ 62.9 39.9 official repo
MLCVNet PointNet++ 64.5 41.4 official repo
H3DNet PointNet++ 64.4 43.4 official repo
H3DNet 4xPointNet++ 67.2 48.1 official repo
Ours(L6, O256) PointNet++ 67.3 (66.2*) 48.9 (48.4*) model
Ours(L12, O256) PointNet++ 67.2 (66.6*) 49.7 (49.3*) model
Ours(L12, O256) PointNet++w2× 68.8 (68.3*) 52.1 (51.1*) model
Ours(L12, O512) PointNet++w2× 69.1 (68.8*) 52.8 (52.3*) model

SUN RGB-D

Method backbone inputs [email protected] [email protected] Model
VoteNet PointNet++ point 59.1 35.8 official repo
MLCVNet PointNet++ point 59.8 - official repo
HGNet GU-net point 61.6 - -
H3DNet 4xPointNet++ point 60.1 39.0 official repo
imVoteNet PointNet++ point+RGB 63.4 - official repo
Ours(L6, O256) PointNet++ point 62.8 (62.6*) 42.3 (42.0*) model

Notes:

  • * means the result is averaged over 5-times evaluation since the algorithm randomness is large.

Install

Requirements

  • Ubuntu 16.04
  • Anaconda with python=3.6
  • pytorch>=1.3
  • torchvision with pillow<7
  • cuda=10.1
  • trimesh>=2.35.39,<2.35.40
  • 'networkx>=2.2,<2.3'
  • compile the CUDA layers for PointNet++, which we used in the backbone network: sh init.sh
  • others: pip install termcolor opencv-python tensorboard

Data preparation

For SUN RGB-D, follow the README under the sunrgbd folder.

For ScanNet, follow the README under the scannet folder.

Usage

ScanNet

For L6, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --num_decoder_layers 6 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 50000 --num_decoder_layers 6 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For L12, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --num_decoder_layers 12 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 50000 --num_decoder_layers 12 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For w2x, L12, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For w2x, L12, O256 evaluation:

python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For w2x, L12, O512 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For w2x, L12, O512 evaluation:

python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

SUN RGB-D

For L6, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 \
    --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 \
    --learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 \
    --dataset sunrgbd --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 20000 --num_decoder_layers 6 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset sunrgbd --data_root <data directory> [--dump_dir <dump directory>]

Acknowledgements

We thank a lot for the flexible codebase of votenet.

License

The code is released under MIT License (see LICENSE file for details).

Owner
Ze Liu
USTC & MSRA Joint-PhD candidate.
Ze Liu
FedML: A Research Library and Benchmark for Federated Machine Learning

FedML: A Research Library and Benchmark for Federated Machine Learning 📄 https://arxiv.org/abs/2007.13518 News 2021-02-01 (Award): #NeurIPS 2020# Fed

FedML-AI 2.3k Jan 08, 2023
codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation (EMNLP-2021 main conference) Contents Overview Background Quick to Use Furth

Adaxry 13 Jul 25, 2022
The NEOSSat is a dual-mission microsatellite designed to detect potentially hazardous Earth-orbit-crossing asteroids and track objects that reside in deep space

The NEOSSat is a dual-mission microsatellite designed to detect potentially hazardous Earth-orbit-crossing asteroids and track objects that reside in deep space

John Salib 2 Jan 30, 2022
Improving adversarial robustness by a coupling rejection strategy

Adversarial Training with Rectified Rejection The code for the paper Adversarial Training with Rectified Rejection. Environment settings and libraries

Tianyu Pang 29 Jan 06, 2023
Tf alloc - Simplication of GPU allocation for Tensorflow2

tf_alloc Simpliying GPU allocation for Tensorflow Developer: korkite (Junseo Ko)

Junseo Ko 3 Feb 10, 2022
Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning

Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning This repository provides an implementation of the paper Beta S

Yongchan Kwon 28 Nov 10, 2022
Fader Networks: Manipulating Images by Sliding Attributes - NIPS 2017

FaderNetworks PyTorch implementation of Fader Networks (NIPS 2017). Fader Networks can generate different realistic versions of images by modifying at

Facebook Research 753 Dec 23, 2022
PINN Burgers - 1D Burgers equation simulated by PINN

PINN(s): Physics-Informed Neural Network(s) for Burgers equation This is an impl

ShotaDEGUCHI 1 Feb 12, 2022
Deep learning models for change detection of remote sensing images

Change Detection Models (Remote Sensing) Python library with Neural Networks for Change Detection based on PyTorch. ⚡ ⚡ ⚡ I am trying to build this pr

Kaiyu Li 176 Dec 24, 2022
Audio Visual Emotion Recognition using TDA

Audio Visual Emotion Recognition using TDA RAVDESS database with two datasets analyzed: Video and Audio dataset: Audio-Dataset: https://www.kaggle.com

Combinatorial Image Analysis research group 3 May 11, 2022
HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks

Approximate Multiplier by HEAM What's HEAM? HEAM is a general optimization method to generate high-efficiency approximate multipliers for specific app

4 Sep 11, 2022
Official Pytorch Implementation of Unsupervised Image Denoising with Frequency Domain Knowledge

Unsupervised Image Denoising with Frequency Domain Knowledge (BMVC 2021 Oral) : Official Project Page This repository provides the official PyTorch im

Donggon Jang 12 Sep 26, 2022
The aim of the game, as in the original one, is to find a specific image from a group of different images of a person's face

GUESS WHO Main Links: [Github] [App] Related Links: [CLIP] [Celeba] The aim of the game, as in the original one, is to find a specific image from a gr

Arnau - DIMAI 3 Jan 04, 2022
Code for the paper "Location-aware Single Image Reflection Removal"

Location-aware Single Image Reflection Removal The shown images are provided by the datasets from IBCLN, ERRNet, SIR2 and the Internet images. The cod

72 Dec 08, 2022
Pun Detection and Location

Pun Detection and Location “The Boating Store Had Its Best Sail Ever”: Pronunciation-attentive Contextualized Pun Recognition Yichao Zhou, Jyun-yu Jia

lawson 3 May 13, 2022
Oscar and VinVL

Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks VinVL: Revisiting Visual Representations in Vision-Language Models Updates

Microsoft 938 Dec 26, 2022
The code for paper "Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation" which is accepted by AAAI 2022

Contrastive Spatio Temporal Pretext Learning for Self-supervised Video Representation (AAAI 2022) The code for paper "Contrastive Spatio-Temporal Pret

8 Jun 30, 2022
State-Relabeling Adversarial Active Learning

State-Relabeling Adversarial Active Learning Code for SRAAL [2020 CVPR Oral] Requirements torch = 1.6.0 numpy = 1.19.1 tqdm = 4.31.1 AL Results The

10 Jul 14, 2022
An NLP library with Awesome pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.

简体中文 | English News [2021-10-12] PaddleNLP 2.1版本已发布!新增开箱即用的NLP任务能力、Prompt Tuning应用示例与生成任务的高性能推理! 🎉 更多详细升级信息请查看Release Note。 [2021-08-22]《千言:面向事实一致性的生

6.9k Jan 01, 2023
Official implementation of NeurIPS'2021 paper TransformerFusion

TransformerFusion: Monocular RGB Scene Reconstruction using Transformers Project Page | Paper | Video TransformerFusion: Monocular RGB Scene Reconstru

Aljaz Bozic 118 Dec 25, 2022