Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)

Overview

arXiv visitors

Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)

This is the official implementation of Focals Conv (CVPR 2022), a new sparse convolution design for 3D object detection (feasible for both lidar-only and multi-modal settings). For more details, please refer to:

Focal Sparse Convolutional Networks for 3D Object Detection [Paper]
Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, Jiaya Jia

Experimental results

KITTI dataset

[email protected] [email protected] download
PV-RCNN + Focals Conv 83.91 85.20 Google | Baidu (key: m15b)
PV-RCNN + Focals Conv (multimodal) 84.58 85.34 Google | Baidu (key: ie6n)
Voxel R-CNN (Car) + Focals Conv (multimodal) 85.68 86.00 Google | Baidu (key: tnw9)

nuScenes dataset

mAP NDS download
CenterPoint + Focals Conv (multi-modal) 63.86 69.41 Google | Baidu (key: 01jh)
CenterPoint + Focals Conv (multi-modal) - 1/4 data 62.15 67.45 Google | Baidu (key: 6qsc)

Visualization of voxel distribution of Focals Conv on KITTI val dataset:

Getting Started

Installation

a. Clone this repository

https://github.com/dvlab-research/FocalsConv && cd FocalsConv

b. Install the environment

Following the install documents for OpenPCdet and CenterPoint codebases respectively, based on your preference.

*spconv 2.x is highly recommended instead of spconv 1.x version.

c. Prepare the datasets.

Download and organize the official KITTI and Waymo following the document in OpenPCdet, and nuScenes from the CenterPoint codebase.

*Note that for nuScenes dataset, we use image-level gt-sampling (copy-paste) in the multi-modal training. Please download this dbinfos_train_10sweeps_withvelo.pkl to replace the original one. (Google | Baidu (key: b466))

*Note that for nuScenes dataset, we conduct ablation studies on a 1/4 data training split. Please download infos_train_mini_1_4_10sweeps_withvelo_filter_True.pkl if you needed for training. (Google | Baidu (key: 769e))

d. Download pre-trained models.

If you want to directly evaluate the trained models we provide, please download them first.

If you want to train by yourselvef, for multi-modal settings, please download this resnet pre-train model first, torchvision-res50-deeplabv3.

Evaluation

We provide the trained weight file so you can just run with that. You can also use the model you trained.

For models in OpenPCdet,

NUM_GPUS=8
cd tools 
bash scripts/dist_test.sh ${NUM_GPUS} --cfg_file cfgs/kitti_models/voxel_rcnn_car_focal_multimodal.yaml --ckpt path/to/voxelrcnn_focal_multimodal.pth

bash scripts/dist_test.sh ${NUM_GPUS} --cfg_file cfgs/kitti_models/pv_rcnn_focal_multimodal.yaml --ckpt ../pvrcnn_focal_multimodal.pth

bash scripts/dist_test.sh ${NUM_GPUS} --cfg_file cfgs/kitti_models/pv_rcnn_focal_lidar.yaml --ckpt path/to/pvrcnn_focal_lidar.pth

For models in CenterPoint,

CONFIG="nusc_centerpoint_voxelnet_0075voxel_fix_bn_z_focal_multimodal"
python -m torch.distributed.launch --nproc_per_node=${NUM_GPUS} ./tools/dist_test.py configs/nusc/voxelnet/$CONFIG.py --work_dir ./work_dirs/$CONFIG --checkpoint centerpoint_focal_multimodal.pth

Training

For configures in OpenPCdet,

bash scripts/dist_train.sh ${NUM_GPUS} --cfg_file cfgs/kitti_models/CONFIG.yaml

For configures in CenterPoint,

python -m torch.distributed.launch --nproc_per_node=${NUM_GPUS} ./tools/train.py configs/nusc/voxelnet/$CONFIG.py --work_dir ./work_dirs/CONFIG
  • Note that we use 8 GPUs to train OpenPCdet models and 4 GPUs to train CenterPoint models.

TODO List

    • Config files and trained models on the overall Waymo dataset.
    • Config files and scripts for the test augs (double-flip and rotation) in nuScenes test submission.
    • Results and models of Focals Conv Networks on 3D Segmentation datasets.

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{focalsconv-chen,
  title={Focal Sparse Convolutional Networks for 3D Object Detection},
  author={Chen, Yukang and Li, Yanwei and Zhang, Xiangyu and Sun, Jian and Jia, Jiaya},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

Acknowledgement

  • This work is built upon the OpenPCDet and CenterPoint. Please refer to the official github repositories, OpenPCDet and CenterPoint for more information.

  • This README follows the style of IA-SSD.

License

This project is released under the Apache 2.0 license.

Related Repos

  1. spconv GitHub stars
  2. Deformable Conv GitHub stars
  3. Submanifold Sparse Conv GitHub stars
Issues
  • I can't train the multi-modal accuracy(KITTI val split in AP3D(R11))you mentioned in the paper

    I can't train the multi-modal accuracy(KITTI val split in AP3D(R11))you mentioned in the paper

    my val result: epochs:90 3d AP:89.3103, 85.0520, 79.2089

    your val result in Table 8 Focal Conv-F 3d: 89.82 85.22 85.19

    I wonder why can't improve accuracy in hard situation and how many epochs you train?

    opened by qimingx 11
  • lidar-only focal centerpoint config is missing

    lidar-only focal centerpoint config is missing

    I see centerpoint with multimodal focalsconv , but can not find centerpoint with unimodal focalsconv version. So I made new config myself changing centerpoint config from voxelnet to voxelfocal, from spmiddleresnetfhd to spmiddleresnetfhdfocal with use_img = false. Is it right way to do so??

    opened by konyul 7
  • UnboundLocalError: local variable 'mv_height' referenced before assignment

    UnboundLocalError: local variable 'mv_height' referenced before assignment

    Hello, thank u for your awesome job! When I try to use the pv_rcnn_focal_lidar.yaml to train, there was an error occured: UnboundLocalError: local variable 'mv_height' referenced before assignment

    the full error is:

    Traceback (most recent call last): | 0/464 [00:00<?, ?it/s] File "train.py", line 201, in main() File "train.py", line 153, in main train_model( File "/data1/qinjia/FocalsConv/OpenPCDet/tools/train_utils/train_utils.py", line 111, in train_model accumulated_iter = train_one_epoch( File "/data1/qinjia/FocalsConv/OpenPCDet/tools/train_utils/train_utils.py", line 25, in train_one_epoch batch = next(dataloader_iter) File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 363, in next data = self._next_data() File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data return self._process_data(data) File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data data.reraise() File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) UnboundLocalError: Caught UnboundLocalError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop data = fetcher.fetch(index) File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/qinjia/Software/anaconda3/envs/FocalConv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/data1/qinjia/FocalsConv/OpenPCDet/tools/../pcdet/datasets/kitti/kitti_dataset.py", line 425, in getitem data_dict = self.prepare_data(data_dict=input_dict) File "/data1/qinjia/FocalsConv/OpenPCDet/tools/../pcdet/datasets/dataset.py", line 130, in prepare_data data_dict = self.data_augmentor.forward( File "/data1/qinjia/FocalsConv/OpenPCDet/tools/../pcdet/datasets/augmentor/data_augmentor.py", line 243, in forward data_dict = cur_augmentor(data_dict=data_dict) File "/data1/qinjia/FocalsConv/OpenPCDet/tools/../pcdet/datasets/augmentor/database_sampler.py", line 403, in call mv_height = mv_height[valid_mask] UnboundLocalError: local variable 'mv_height' referenced before assignment

    and I set the USE_ROAD_PLANE : FALSE in pv_rcnn_focal_lidar.yaml , because my custom dataset don't have the road plane data, but then I try to use KITTI data, the same error still occured.

    I try to google this error, it seems like that this vraiable is not defined before the reference. I try to check out the mv_height in /FocalsConv/OpenPCDet/tools/../pcdet/datasets/augmentor/database_sampler.py. line 403, in call: mv_height = mv_height[valid_mask]

    But I don't understand what wrong with this, if u can help me, I'd be very appreciated! Thanks!

    opened by Jane-QinJ 3
  • How to compute the params and runtime(inference time?)

    How to compute the params and runtime(inference time?)

    Dear author, first of all, thanks for your great work. After reading your paper, I really want to know how to calculate the params and the runtime of adding Focals Conv to VoxelRCNN as u mentioned in your Experiments, and I want to try it, but I don't know how to do it, there's little information on the Internet, and after searching it in Google, I got confused. So I ask for your help if you have the code to accomplish it. I would very appreciate it if you could help me! Thank you in advance.

    opened by Jane-QinJ 2
  • KeyError: 'road_plane' in database_sampler.py

    KeyError: 'road_plane' in database_sampler.py

    Hi, thanks for your excellent work! When I formed pkl file using your code and started to train, I got this KeyError in database_sampler.py file, and through debug I found there is no road_plane in data_dict. Could you help me solve this? Thanks a lot! image image

    opened by YangHan-Morningstar 2
  • Error in database_sampler.py

    Error in database_sampler.py

    Hi, thanks for your excellent work! When I using your code to train, I found a strange code in database_sampler.py image image Here infos is a list which containing all information of samples and each index should be integer, but cur_class is a label string, such as Car or Cycle! I'm confused with it, could you help me? Thanks a lot!

    opened by YangHan-Morningstar 2
  • How long does it takes to train multimodal FocalsConv on nuScenes dataset?

    How long does it takes to train multimodal FocalsConv on nuScenes dataset?

    Hello @yukang2017! I am trying to train multimodal FocalsConv on 1/4 nuScenes training set with four 2080Ti GPUs. It is expected to take about six to seven days according to the log. I am not familiar with nuScenes dataset and I wonder if there is something wrong with my environment. So how long do you need to train multimodal FocalsConv on 1/4 nuScenes dataset and the whole dataset?

    opened by rkotimi 2
  • Data augmentation in Section 3.3.

    Data augmentation in Section 3.3.

    It's an interesting work. I have a question about the words in section 3.3 of the paper "For ground-truth sampling, we copy the corresponding 2D objects onto images. Rather than using an additional segmentation model or mask annotations [57], we directly crop objects in bounding boxes for simplification." However, copying other objects to the current scene will overlap in the RGB image. Copying paste objects from other scenes in point clouds is effective and overlapping objects can be removed through collision experiments. How to deal with overlapping objects in RGB images?

    opened by CBY-9527 1
  • Occlusion between different classes in multi modal GT-Sampling.

    Occlusion between different classes in multi modal GT-Sampling.

    Hello @yukang2017 ! I notice that your FocalsConv has been supported by OpenPCDet. Congratulations!

    After reading your code implementation of multi modal GT-Sampling, I think you only deal with the occlusion of objects belong to the same class on the image, which may case severe occlusion between different classes. Is it a potential bug? May it lead to suboptimal performance?

    opened by rkotimi 1
  • add web demo/model to Huggingface

    add web demo/model to Huggingface

    Hi, would you be interested in adding FocalsConv to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.

    Example from other organizations: Keras: https://huggingface.co/keras-io Microsoft: https://huggingface.co/microsoft Facebook: https://huggingface.co/facebook

    Example spaces with repos: github: https://github.com/salesforce/BLIP Spaces: https://huggingface.co/spaces/salesforce/BLIP

    github: https://github.com/facebookresearch/omnivore Spaces: https://huggingface.co/spaces/akhaliq/omnivore

    and here are guides for adding spaces/models/datasets to your org

    How to add a Space: https://huggingface.co/blog/gradio-spaces how to add models: https://huggingface.co/docs/hub/adding-a-model uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

    Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

    opened by AK391 2
Owner
DV Lab
Deep Vision Lab
DV Lab
Official PyTorch Implementation of Convolutional Hough Matching Networks, CVPR 2021 (oral)

Convolutional Hough Matching Networks This is the implementation of the paper "Convolutional Hough Matching Network" by J. Min and M. Cho. Implemented

Juhong Min 65 Jul 17, 2022
[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

 同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 654 Jul 26, 2022
Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"

Focal Transformer This is the official implementation of our Focal Transformer -- "Focal Self-attention for Local-Global Interactions in Vision Transf

Microsoft 451 Jul 27, 2022
Focal and Global Knowledge Distillation for Detectors

FGD Paper: Focal and Global Knowledge Distillation for Detectors Install MMDetection and MS COCO2017 Our codes are based on MMDetection. Please follow

Mesopotamia 201 Jul 21, 2022
Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

Kakao Brain 101 Jul 12, 2022
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 51 Jul 20, 2022
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

Maurice Weiler 58 Jun 23, 2022
Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors (CVPR 2022) Personal Web Pages | Paper | Project Page This repository contains

null 126 Jul 25, 2022
Submanifold sparse convolutional networks

Submanifold Sparse Convolutional Networks This is the PyTorch library for training Submanifold Sparse Convolutional Networks. Spatial sparsity This li

Facebook Research 1.7k Jul 20, 2022
Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

Differentiable Neural Computers and family, for Pytorch Includes: Differentiable Neural Computers (DNC) Sparse Access Memory (SAM) Sparse Differentiab

ixaxaar 296 Jun 9, 2022
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

null 30 Jul 23, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 67 Jul 18, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Antoine Yang 70 Jul 24, 2022
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

NVIDIA Research Projects 1.1k Jul 29, 2022
[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

null 216 Jul 20, 2022
Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

News 05/10/2022 To make the comparison on ScanNet easier, we provide all quantitative and qualitative results of baselines here, including COLMAP, COL

ZJU3DV 302 Jul 21, 2022
(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"

RepSurf - Surface Representation for Point Clouds [CVPR 2022 Oral] By Haoxi Ran* , Jun Liu, Chengjie Wang ( * : corresponding contact) The pytorch off

Haoxi Ran 129 Jul 14, 2022
The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

DeepBDC for few-shot learning        Introduction In this repo, we provide the implementation of the following paper: "Joint Distribution Matters: Dee

FeiLong 85 Jul 21, 2022
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Fangzhou Hong 67 Jul 6, 2022