Turning pixels into virtual points for multimodal 3D object detection.

Related tags

Deep LearningMVP
Overview

Multimodal Virtual Point 3D Detection

Turning pixels into virtual points for multimodal 3D object detection.

Multimodal Virtual Point 3D Detection,
Tianwei Yin, Xingyi Zhou, Philipp Krähenbühl,
arXiv technical report (arXiv 2111.06881 )

@article{yin2021multimodal,
  title={Multimodal Virtual Point 3D Detection},
  author={Yin, Tianwei and Zhou, Xingyi and Kr{\"a}henb{\"u}hl, Philipp},
  journal={NeurIPS},
  year={2021},
}

Contact

Any questions or suggestions are welcome!

Tianwei Yin [email protected] Xingyi Zhou [email protected]

Abstract

Lidar-based sensing drives current autonomous vehicles. Despite rapid progress, current Lidar sensors still lag two decades behind traditional color cameras in terms of resolution and cost. For autonomous driving, this means that large objects close to the sensors are easily visible, but far-away or small objects comprise only one measurement or two. This is an issue, especially when these objects turn out to be driving hazards. On the other hand, these same objects are clearly visible in onboard RGB sensors. In this work, we present an approach to seamlessly fuse RGB sensors into Lidar-based 3D recognition. Our approach takes a set of 2D detections to generate dense 3D virtual points to augment an otherwise sparse 3D point-cloud. These virtual points naturally integrate into any standard Lidar-based 3D detectors along with regular Lidar measurements. The resulting multi-modal detector is simple and effective. Experimental results on the large-scale nuScenes dataset show that our framework improves a strong CenterPoint baseline by a significant 6.6 mAP, and outperforms competing fusion approaches.

Main results

3D detection on nuScenes validation set

MAP ↑ NDS ↑
CenterPoint-Voxel 59.5 66.7
CenterPoint-Voxel + MVP 66.0 69.9
CenterPoint-Pillar 52.4 61.5
CenterPoint-Voxel + MVP 62.8 66.2

3D detection on nuScenes test set

MAP ↑ NDS ↑ PKL ↓
MVP 66.4 70.5 0.603

Use MVP

Installation

Please install CenterPoint and CenterNet2. Make sure to add a link to CenterNet2 folder in your python path. We will use CenterNet2 for 2D instance segmentation and CenterPoint for 3D detection.

Getting Started

Download nuscenes data and organise as follows

# For nuScenes Dataset         
└── NUSCENES_DATASET_ROOT
       ├── samples       <-- key frames
       ├── sweeps        <-- frames without annotation
       ├── maps          <-- unused
       ├── v1.0-trainval <-- metadata

Create a symlink to the dataset root in both CenterPoint and MVP's root directories.

mkdir data && cd data
ln -s DATA_ROOT nuScenes

Remember to change the DATA_ROOT to the actual path in your system.

Generate Virtual Points

Download the centernet2 model from here and place it in the root directory.

Use the following command in the current directory to generate virtual points for nuscenes training and validation sets. The points will be saved to data/nuScenes/samples or sweeps/LIDAR_TOP_VIRTUAL.

python virtual_gen.py --info_path data/nuScenes/infos_train_10sweeps_withvelo_filter_True.pkl  

You will need about 80GB space and the whole process will take 10 to 20 hours using a single GPU. You can also download the precomputed virtual points from here.

Create Data

Go to the CenterPoint's root directory and run

# nuScenes
python tools/create_data.py nuscenes_data_prep --root_path=NUSCENES_TRAINVAL_DATASET_ROOT --version="v1.0-trainval" --nsweeps=10 --virtual True 

if you want to reproduce CenterPoint baseline's results, then also run the following command

# nuScenes
python tools/create_data.py nuscenes_data_prep --root_path=NUSCENES_TRAINVAL_DATASET_ROOT --version="v1.0-trainval" --nsweeps=10 --virtual False 

In the end, the data and info files should be organized as follows

# For nuScenes Dataset 
└── CenterPoint
       └── data    
              └── nuScenes 
                     ├── maps          <-- unused
                     |── v1.0-trainval <-- metadata and annotations
                     |── infos_train_10sweeps_withvelo_filter_True.pkl <-- train annotations
                     |── infos_val_10sweeps_withvelo_filter_True.pkl <-- val annotations
                     |── dbinfos_train_10sweeps_withvelo_virtual.pkl <-- GT database info files
                     |── gt_database_10sweeps_withvelo_virtual <-- GT database 
                     |── samples       <-- key frames
                        |── LIDAR_TOP
                        |── LIDAR_TOP_VIRTUAL
                     └── sweeps       <-- frames without annotation
                        |── LIDAR_TOP
                        |── LIDAR_TOP_VIRTUAL

Train & Evaluate in Command Line

Go to CenterPoint's root directory and use the following command to start a distributed training using 4 GPUs. The models and logs will be saved to work_dirs/CONFIG_NAME

python -m torch.distributed.launch --nproc_per_node=4 ./tools/train.py CONFIG_PATH

For distributed testing with 4 gpus,

python -m torch.distributed.launch --nproc_per_node=4 ./tools/dist_test.py CONFIG_PATH --work_dir work_dirs/CONFIG_NAME --checkpoint work_dirs/CONFIG_NAME/latest.pth 

For testing with one gpu and see the inference time,

python ./tools/dist_test.py CONFIG_PATH --work_dir work_dirs/CONFIG_NAME --checkpoint work_dirs/CONFIG_NAME/latest.pth --speed_test 

MODEL ZOO

We experiment with VoxelNet and PointPillars architectures on nuScenes.

VoxelNet

Model Validation MAP Validation NDS Link
centerpoint_baseline 59.5 66.7 URL
Ours 66.0 69.9 URL

PointPillars

Model Validation MAP Validation NDS Link
centerpoint_baseline 52.4 61.5 URL
Ours 62.8 66.2 URL

Test set models and predictions will be updated soon.

License

MIT License.

Owner
Tianwei Yin
Tianwei Yin
Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

PyTorch-High-Res-Stereo-Depth-Estimation Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch. Stereo dep

Ibai Gorordo 26 Nov 24, 2022
Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

NSGDC Some codes in this repo are copied/modified from opensource implementations made available by UNITER, PyTorch, HuggingFace, OpenNMT, and Nvidia.

Zhihao Fan 2 Nov 07, 2022
Vehicle Detection Using Deep Learning and YOLO Algorithm

VehicleDetection Vehicle Detection Using Deep Learning and YOLO Algorithm Dataset take or find vehicle images for create a special dataset for fine-tu

Maryam Boneh 96 Jan 05, 2023
Text and code for the forthcoming second edition of Think Bayes, by Allen Downey.

Think Bayes 2 by Allen B. Downey The HTML version of this book is here. Think Bayes is an introduction to Bayesian statistics using computational meth

Allen Downey 1.5k Jan 08, 2023
Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"

ICCV'21 Context-aware Scene Graph Generation with Seq2Seq Transformers Authors: Yichao Lu*, Himanshu Rai*, Cheng Chang*, Boris Knyazev†, Guangwei Yu,

Layer6 Labs 37 Dec 18, 2022
Website for D2C paper

D2C This is the repository that contains source code for the D2C Website. If you find D2C useful for your work please cite: @article{sinha2021d2c au

1 Oct 21, 2021
AI pipelines for Nvidia Jetson Platform

Jetson Multicamera Pipelines Easy-to-use realtime CV/AI pipelines for Nvidia Jetson Platform. This project: Builds a typical multi-camera pipeline, i.

NVIDIA AI IOT 96 Dec 23, 2022
Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

StrengthNet Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis" https://arxiv.org/abs/2110

RuiLiu 65 Dec 20, 2022
Implementation of paper "Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal"

Patch-wise Adversarial Removal Implementation of paper "Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal

4 Oct 12, 2022
House_prices_kaggle - Predict sales prices and practice feature engineering, RFs, and gradient boosting

House Prices - Advanced Regression Techniques Predicting House Prices with Machine Learning This project is build to enhance my knowledge about machin

Gurpreet Singh 1 Jan 01, 2022
An energy estimator for eyeriss-like DNN hardware accelerator

Energy-Estimator-for-Eyeriss-like-Architecture- An energy estimator for eyeriss-like DNN hardware accelerator This is an energy estimator for eyeriss-

HEXIN BAO 2 Mar 26, 2022
This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation".

IR-GAIL This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation". Dependency The experiments are de

Zhao-Heng Yin 1 Jul 14, 2022
This program automatically runs Python code copied in clipboard

CopyRun This program runs Python code which is copied in clipboard WARNING!! USE AT YOUR OWN RISK! NO GUARANTIES IF ANYTHING GETS BROKEN. DO NOT COPY

vertinski 4 Sep 10, 2021
Deep Reinforcement Learning for Multiplayer Online Battle Arena

MOBA_RL Deep Reinforcement Learning for Multiplayer Online Battle Arena Prerequisite Python 3 gym-derk Tensorflow 2.4.1 Dotaservice of TimZaman Seed R

Dohyeong Kim 32 Dec 18, 2022
Code To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment.

COLIEE 2021 - task 2: Legal Case Entailment This repository contains the code to reproduce NeuralMind's submissions to COLIEE 2021 presented in the pa

NeuralMind 13 Dec 16, 2022
DrNAS: Dirichlet Neural Architecture Search

This paper proposes a novel differentiable architecture search method by formulating it into a distribution learning problem. We treat the continuously relaxed architecture mixing weight as random va

Xiangning Chen 37 Jan 03, 2023
Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

SPEC: Seeing People in the Wild with an Estimated Camera [ICCV 2021] SPEC: Seeing People in the Wild with an Estimated Camera, Muhammed Kocabas, Chun-

Muhammed Kocabas 187 Dec 26, 2022
Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation Introduction This is a PyTorch

XMed-Lab 30 Sep 23, 2022
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Peter Lin 6.5k Jan 04, 2023
Code for 1st place solution in Sleep AI Challenge SNU Hospital

Sleep AI Challenge SNU Hospital 2021 Code for 1st place solution for Sleep AI Challenge (Note that the code is not fully organized) Refer to the notio

Saewon Yang 13 Jan 03, 2022