Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

Related tags

Deep Learning3detr
Overview

3DETR: An End-to-End Transformer Model for 3D Object Detection

PyTorch implementation and models for 3DETR.

3DETR (3D DEtection TRansformer) is a simpler alternative to complex hand-crafted 3D detection pipelines. It does not rely on 3D backbones such as PointNet++ and uses few 3D-specific operators. 3DETR obtains comparable or better performance than 3D detection methods such as VoteNet. The encoder can also be used for other 3D tasks such as shape classification. More details in the paper "An End-to-End Transformer Model for 3D Object Detection".

[website] [arXiv] [bibtex]

Code description. Our code is based on prior work such as DETR and VoteNet and we aim for simplicity in our implementation. We hope it can ease research in 3D detection.

3DETR Approach Decoder Detections

Pretrained Models

We provide the pretrained model weights and the corresponding metrics on the val set (per class APs, Recalls). We provide a Python script utils/download_weights.py to easily download the weights/metrics files.

Arch Dataset Epochs AP25 AP50 Model weights Eval metrics
3DETR-m SUN RGB-D 1080 59.1 30.3 weights metrics
3DETR SUN RGB-D 1080 58.0 30.3 weights metrics
3DETR-m ScanNet 1080 65.0 47.0 weights metrics
3DETR ScanNet 1080 62.1 37.9 weights metrics

Model Zoo

For convenience, we provide model weights for 3DETR trained for different number of epochs.

Arch Dataset Epochs AP25 AP50 Model weights Eval metrics
3DETR-m SUN RGB-D 90 51.0 22.0 weights metrics
3DETR-m SUN RGB-D 180 55.6 27.5 weights metrics
3DETR-m SUN RGB-D 360 58.2 30.6 weights metrics
3DETR-m SUN RGB-D 720 58.1 30.4 weights metrics
3DETR SUN RGB-D 90 43.7 16.2 weights metrics
3DETR SUN RGB-D 180 52.1 25.8 weights metrics
3DETR SUN RGB-D 360 56.3 29.6 weights metrics
3DETR SUN RGB-D 720 56.0 27.8 weights metrics
3DETR-m ScanNet 90 47.1 19.5 weights metrics
3DETR-m ScanNet 180 58.7 33.6 weights metrics
3DETR-m ScanNet 360 62.4 37.7 weights metrics
3DETR-m ScanNet 720 63.7 44.5 weights metrics
3DETR ScanNet 90 42.8 15.3 weights metrics
3DETR ScanNet 180 54.5 28.8 weights metrics
3DETR ScanNet 360 59.0 35.4 weights metrics
3DETR ScanNet 720 61.1 40.2 weights metrics

Running 3DETR

Installation

Our code is tested with PyTorch 1.4.0, CUDA 10.2 and Python 3.6. It may work with other versions.

You will need to install pointnet2 layers by running

cd third_party/pointnet2 && python setup.py install

You will also need Python dependencies (either conda install or pip install)

matplotlib
opencv-python
plyfile
'trimesh>=2.35.39,<2.35.40'
'networkx>=2.2,<2.3'
scipy

Some users have experienced issues using CUDA 11 or higher. Please try using CUDA 10.2 if you run into CUDA issues.

Optionally, you can install a Cythonized implementation of gIOU for faster training.

conda install cython
cd utils && python cython_compile.py build_ext --inplace

Benchmarking

Dataset preparation

We follow the VoteNet codebase for preprocessing our data. The instructions for preprocessing SUN RGB-D are [here] and ScanNet are [here].

You can edit the dataset paths in datasets/sunrgbd.py and datasets/scannet.py or choose to specify at runtime.

Testing

Once you have the datasets prepared, you can test pretrained models as

python main.py --dataset_name <dataset_name> --nqueries <number of queries> --test_ckpt <path_to_checkpoint> --test_only [--enc_type masked]

We use 128 queries for the SUN RGB-D dataset and 256 queries for the ScanNet dataset. You will need to add the flag --enc_type masked when testing the 3DETR-m checkpoints. Please note that the testing process is stochastic (due to randomness in point cloud sampling and sampling the queries) and so results can vary within 1% AP25 across runs. This stochastic nature of the inference process is also common for methods such as VoteNet.

If you have not edited the dataset paths for the files in the datasets folder, you can pass the path to the datasets using the --dataset_root_dir flag.

Training

The model can be simply trained by running main.py.

python main.py --dataset_name <dataset_name> --checkpoint_dir <path to store outputs>

To reproduce the results in the paper, we provide the arguments in the scripts folder. A variance of 1% AP25 across different training runs can be expected.

You can quickly verify your installation by training a 3DETR model for 90 epochs on ScanNet following the file scripts/scannet_quick.sh and compare it to the pretrained checkpoint from the Model Zoo.

License

The majority of 3DETR is licensed under the Apache 2.0 license as found in the LICENSE file, however portions of the project are available under separate license terms: licensing information for pointnet2 is available at https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/UNLICENSE

Contributing

We welcome your pull requests! Please see CONTRIBUTING and CODE_OF_CONDUCT for more info.

Citation

If you find this repository useful, please consider starring us and citing

@inproceedings{misra2021-3detr,
    title={{An End-to-End Transformer Model for 3D Object Detection}},
    author={Misra, Ishan and Girdhar, Rohit and Joulin, Armand},
    booktitle={{ICCV}},
    year={2021},
}
Owner
Facebook Research
Facebook Research
Simple and ready-to-use tutorials for TensorFlow

TensorFlow World To support maintaining and upgrading this project, please kindly consider Sponsoring the project developer. Any level of support is a

Amirsina Torfi 4.5k Dec 23, 2022
UAV-Networks-Routing is a Python simulator for experimenting routing algorithms and mac protocols on unmanned aerial vehicle networks.

UAV-Networks Simulator - Autonomous Networking - A.A. 20/21 UAV-Networks-Routing is a Python simulator for experimenting routing algorithms and mac pr

0 Nov 13, 2021
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Learning the Best Pooling Strategy for Visual Semantic Embedding Official PyTorch implementation of the paper Learning the Best Pooling Strategy for V

Jiacheng Chen 106 Jan 06, 2023
The official PyTorch code for NeurIPS 2021 ML4AD Paper, "Does Thermal data make the detection systems more reliable?"

MultiModal-Collaborative (MMC) Learning Framework for integrating RGB and Thermal spectral modalities This is the official code for NeurIPS 2021 Machi

NeurAI 12 Nov 02, 2022
Physics-Informed Neural Networks (PINN) and Deep BSDE Solvers of Differential Equations for Scientific Machine Learning (SciML) accelerated simulation

NeuralPDE NeuralPDE.jl is a solver package which consists of neural network solvers for partial differential equations using scientific machine learni

SciML Open Source Scientific Machine Learning 680 Jan 02, 2023
Deep Networks with Recurrent Layer Aggregation

RLA-Net: Recurrent Layer Aggregation Recurrence along Depth: Deep Networks with Recurrent Layer Aggregation This is an implementation of RLA-Net (acce

Joy Fang 21 Aug 16, 2022
A Pytorch implementation of "LegoNet: Efficient Convolutional Neural Networks with Lego Filters" (ICML 2019).

LegoNet This code is the implementation of ICML2019 paper LegoNet: Efficient Convolutional Neural Networks with Lego Filters Run python train.py You c

YangZhaohui 140 Sep 26, 2022
[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"

GCA Source code for Graph Contrastive Learning with Adaptive Augmentation (WWW 2021) For example, to run GCA-Degree under WikiCS, execute: python trai

Big Data and Multi-modal Computing Group, CRIPAC 97 Jan 07, 2023
The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch Railway

Openspoor The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch

7 Aug 22, 2022
Knowledge Management for Humans using Machine Learning & Tags

HyperTag HyperTag helps humans intuitively express how they think about their files using tags and machine learning.

Ravn Tech, Inc. 165 Nov 04, 2022
《Improving Unsupervised Image Clustering With Robust Learning》(2020)

Improving Unsupervised Image Clustering With Robust Learning This repo is the PyTorch codes for "Improving Unsupervised Image Clustering With Robust L

Sungwon Park 129 Dec 27, 2022
Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train format

ttopt Description Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train (TT) format and maximu

5 May 23, 2022
High accurate tool for automatic faces detection with landmarks

faces_detanator High accurate tool for automatic faces detection with landmarks. The library is based on public detectors with high accuracy (TinaFace

Ihar 7 May 10, 2022
NR-GAN: Noise Robust Generative Adversarial Networks

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

Takuhiro Kaneko 59 Dec 11, 2022
[ICCV'21] Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment

CKDN The official implementation of the ICCV2021 paper "Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment" O

Multimedia Research 50 Dec 13, 2022
PoseCamera is python based SDK for human pose estimation through RGB webcam.

PoseCamera PoseCamera is python based SDK for human pose estimation through RGB webcam. Install install posecamera package through pip pip install pos

WonderTree 7 Jul 20, 2021
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Ren Yurui 261 Jan 09, 2023
Training deep models using anime, illustration images.

animeface deep models for anime images. Datasets anime-face-dataset Anime faces collected from Getchu.com. Based on Mckinsey666's dataset. 63.6K image

Tomoya Sawada 61 Dec 25, 2022
Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

(ACMMM 2021 Oral) SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment This repository shows two tasks: Face landmark detection and Fac

BoomStar 51 Dec 13, 2022
New approach to benchmark VQA models

VQA Benchmarking This repository contains the web application & the python interface to evaluate VQA models. Documentation Please see the documentatio

4 Jul 25, 2022