Progressive Coordinate Transforms for Monocular 3D Object Detection

Overview

Progressive Coordinate Transforms for Monocular 3D Object Detection

This repository is the official implementation of PCT.

Introduction

In this paper, we propose a novel and lightweight approach, dubbed Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations for monocular 3D object detection. Specifically, a localization boosting mechanism with confidence-aware loss is introduced to progressively refine the localization prediction. In addition, semantic image representation is also exploited to compensate for the usage of patch proposals. Despite being lightweight and simple, our strategy allows us to establish a new state-of-the-art among the monocular 3D detectors on the competitive KITTI benchmark. At the same time, our proposed PCT shows great generalization to most coordinate-based 3D detection frameworks.

arch

Requirements

Installation

Download this repository (tested under python3.7, pytorch1.3.1 and ubuntu 16.04.7). There are also some dependencies like cv2, yaml, tqdm, etc., and please install them accordingly:

cd #root
pip install -r requirements

Then, you need to compile the evaluation script:

cd root/tools/kitti_eval
sh compile.sh

Prepare your data

First, you should download the KITTI dataset, and organize the data as follows (* indicates an empty directory to store the data generated in subsequent steps):


#ROOT
  |data
    |KITTI
      |2d_detections
      |ImageSets
      |pickle_files *
      |object
        |training
          |calib
          |image_2
          |label
          |depth *
          |pseudo_lidar (optional for Pseudo-LiDAR)*
          |velodyne (optional for FPointNet)
        |testing
          |calib
          |image_2
          |depth *
          |pseudo_lidar (optional for Pseudo-LiDAR)*
          |velodyne (optional for FPointNet)

Second, you need to prepare your depth maps and put them to data/KITTI/object/training/depth. For ease of use, we also provide the estimated depth maps (these data generated from the pretrained models provided by DORN and Pseudo-LiDAR).

Monocular (DORN) Stereo (PSMNet)
trainval(~1.6G), test(~1.6G) trainval(~2.5G)

Then, you need to generate image 2D features for the 2D bounding boxes and put them to data/KITTI/pickle_files/org. We train the 2D detector according to the 2D detector in RTM3D. You can also use your own 2D detector for training and inference.

Finally, generate the training data using provided scripts :

cd #root/tools/data_prepare
python patch_data_prepare_val.py --gen_train --gen_val --gen_val_detection --car_only
mv *.pickle ../../data/KITTI/pickle_files

Prepare Waymo dataset

We also provide Waymo Usage for monocular 3D detection.

Training

Move to the workplace and train the mode (also need to modify the path of pickle files in config file):

 cd #root
 cd experiments/pct
 python ../../tools/train_val.py --config config_val.yaml

Evaluation

Generate the results using the trained model:

 python ../../tools/train_val.py --config config_val.yaml --e

and evalute the generated results using:

../../tools/kitti_eval/evaluate_object_3d_offline_ap11 ../../data/KITTI/object/training/label_2 ./output

or

../../tools/kitti_eval/evaluate_object_3d_offline_ap40 ../../data/KITTI/object/training/label_2 ./output

we provide the generated results for evaluation due to the tedious process of data preparation process. Unzip the output.zip and then execute the above evaluation commonds. Result is:

Models [email protected]. [email protected] [email protected]
PatchNet + PCT 27.53 / 34.65 38.39 / 47.16 24.44 / 28.47

Acknowledgements

This code benefits from the excellent work PatchNet, and use the off-the-shelf models provided by DORN and RTM3D.

Citation

@article{wang2021pct,
  title={Progressive Coordinate Transforms for Monocular 3D Object Detection},
  author={Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, Xiangyang Xue},
  journal={arXiv preprint arXiv:2108.05793},
  year={2021}
}

Contact

For questions regarding PCT-3D, feel free to post here or directly contact the authors ([email protected]).

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

R-Drop: Regularized Dropout for Neural Networks

R-Drop: Regularized Dropout for Neural Networks R-drop is a simple yet very effective regularization method built upon dropout, by minimizing the bidi

756 Dec 27, 2022
Official implementation of the paper 'High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network' in CVPR 2021

LPTN Paper | Supplementary Material | Poster High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network Ji

372 Dec 26, 2022
Neural style transfer in PyTorch.

style-transfer-pytorch An implementation of neural style transfer (A Neural Algorithm of Artistic Style) in PyTorch, supporting CPUs and Nvidia GPUs.

Katherine Crowson 395 Jan 06, 2023
Official implementation of MSR-GCN (ICCV 2021 paper)

MSR-GCN Official implementation of MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction (ICCV 2021 paper) [Paper] [Sup

LevonDang 42 Nov 07, 2022
Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

Belinda Li 20 May 17, 2022
I-SECRET: Importance-guided fundus image enhancement via semi-supervised contrastive constraining

I-SECRET This is the implementation of the MICCAI 2021 Paper "I-SECRET: Importance-guided fundus image enhancement via semi-supervised contrastive con

13 Dec 02, 2022
🗣️ Microsoft Edge TTS for Home Assistant, no need for app_key

Microsoft Edge TTS for Home Assistant This component is based on the TTS service of Microsoft Edge browser, no need to apply for app_key. Install Down

152 Dec 31, 2022
StyleGAN2 Webtoon / Anime Style Toonify

StyleGAN2 Webtoon / Anime Style Toonify Korea Webtoon or Japanese Anime Character Stylegan2 base high Quality 1024x1024 / 512x512 Generate and Transfe

121 Dec 21, 2022
An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Decoupled-Contrastive-Learning This repository is an implementation for the loss function proposed in Decoupled Contrastive Loss paper. Requirements P

Ramin Nakhli 71 Dec 04, 2022
A fast python implementation of Ray Tracing in One Weekend using python and Taichi

ray-tracing-one-weekend-taichi A fast python implementation of Ray Tracing in One Weekend using python and Taichi. Taichi is a simple "Domain specific

157 Dec 26, 2022
Implementation of E(n)-Transformer, which extends the ideas of Welling's E(n)-Equivariant Graph Neural Network to attention

E(n)-Equivariant Transformer (wip) Implementation of E(n)-Equivariant Transformer, which extends the ideas from Welling's E(n)-Equivariant G

Phil Wang 132 Jan 02, 2023
A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

sam4onnx A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for

Katsuya Hyodo 6 May 15, 2022
PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time

PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time The implementation is based on SIGGRAPH Aisa'20. Dependencies Python 3.7 Ubuntu

soratobtai 124 Dec 08, 2022
Anagram Generator in Python

Anagrams Generator This is a program for computing multiword anagrams. It makes no effort to come up with sentences that make sense; it only finds ana

Day Fundora 5 Nov 17, 2022
PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition The unofficial code of CDistNet. Now, we ha

25 Jul 20, 2022
(ICCV'21) Official PyTorch implementation of Relational Embedding for Few-Shot Classification

Relational Embedding for Few-Shot Classification (ICCV 2021) Dahyun Kang, Heeseung Kwon, Juhong Min, Minsu Cho [paper], [project hompage] We propose t

Dahyun Kang 82 Dec 24, 2022
Moiré Attack (MA): A New Potential Risk of Screen Photos [NeurIPS 2021]

Moiré Attack (MA): A New Potential Risk of Screen Photos [NeurIPS 2021] This repository is the official implementation of Moiré Attack (MA): A New Pot

Dantong Niu 22 Dec 24, 2022
Implement A3C for Mujoco gym envs

pytorch-a3c-mujoco Disclaimer: my implementation right now is unstable (you ca refer to the learning curve below), I'm not sure if it's my problems. A

Andrew 70 Dec 12, 2022
Deduplicating Training Data Makes Language Models Better

Deduplicating Training Data Makes Language Models Better This repository contains code to deduplicate language model datasets as descrbed in the paper

Google Research 431 Dec 27, 2022
A video scene detection algorithm is designed to detect a variety of different scenes within a video

Scene-Change-Detection - A video scene detection algorithm is designed to detect a variety of different scenes within a video. There is a very simple definition for a scene: It is a series of logical

1 Jan 04, 2022