Embracing Single Stride 3D Object Detector with Sparse Transformer

Related tags

Deep LearningSST
Overview

SST: Single-stride Sparse Transformer

This is the official implementation of paper:

Embracing Single Stride 3D Object Detector with Sparse Transformer

Authors: Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, Zhaoxiang Zhang

Paper Link (Check again on Monday)

Introduction and Highlights

  • SST is a single-stride network, which maintains original feature resolution from the beginning to the end of the network. Due to the characterisric of single stride, SST achieves exciting performances on small object detection (Pedestrian, Cyclist).
  • For simplicity, except for backbone, SST is almost the same with the basic PointPillars in MMDetection3D. With such a basic setting, SST achieves state-of-the-art performance in Pedestrian and Cyclist and outperforms PointPillars more than 10 AP only at a cost of 1.5x latency.
  • SST consists of 6 Regional Sparse Attention (SRA) blocks, which deal with the sparse voxel set. It's similar to Submanifold Sparse Convolution (SSC), but much more powerful than SSC. It's locality and sparsity guarantee the efficiency in the single stride setting.
  • The SRA can also be used in many other task to process sparse point clouds. Our implementation of SRA only relies on the pure Python APIs in PyTorch without engineering efforts as taken in the CUDA implementation of sparse convolution.
  • Large room for further improvements. For example, second stage, anchor-free head, IoU scores and advanced techniques from ViT, etc.

Usage

PyTorch >= 1.9 is highly recommended for a better support of the checkpoint technique.

Our immplementation is based on MMDetection3D, so just follow their getting_started and simply run the script: run.sh. Then you will get a basic results of SST after 5~7 hours (depends on your devices).

We only provide the single-stage model here, as for our two-stage models, please follow LiDAR-RCNN. It's also a good choice to apply other powerful second stage detectors to our single-stage SST.

Main results

Single-stage Model (based on PointPillars) on Waymo validation split

#Sweeps Veh_L1 Ped_L1 Cyc_L1
SST_1f 1 73.57 80.01 70.72
SST_3f 3 75.16 83.24 75.96

Note that we train the 3 classes together, so the performance above is a little bit lower than that reported in our paper.

TODO

  • Build SRA block with similar API as Sparse Convolution for more convenient usage.

Acknowlegement

This project is based on the following codebases.

Owner
TuSimple
The Future of Trucking
TuSimple
Entity-Based Knowledge Conflicts in Question Answering.

Entity-Based Knowledge Conflicts in Question Answering Run Instructions | Paper | Citation | License This repository provides the Substitution Framewo

Apple 35 Oct 19, 2022
Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion

CSF Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion Tips: For testing: CUDA_VISIBLE_DEVICES=0 python main.py For trai

Han Xu 14 Oct 31, 2022
MINERVA: An out-of-the-box GUI tool for offline deep reinforcement learning

MINERVA is an out-of-the-box GUI tool for offline deep reinforcement learning, designed for everyone including non-programmers to do reinforcement learning as a tool.

Takuma Seno 80 Nov 06, 2022
The PyTorch implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision.

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision The PyTorch implementation of DiscoBox: Weakly Supe

Shiyi Lan 1 Oct 23, 2021
Enhancing Knowledge Tracing via Adversarial Training

Enhancing Knowledge Tracing via Adversarial Training This repository contains source code for the paper "Enhancing Knowledge Tracing via Adversarial T

Xiaopeng Guo 14 Oct 24, 2022
"Learning Free Gait Transition for Quadruped Robots vis Phase-Guided Controller"

PhaseGuidedControl The current version is developed based on the old version of RaiSim series, and possibly requires further modification. It will be

X-Mechanics 12 Oct 21, 2022
A very impractical 3D rendering engine that runs in the python terminal.

Terminal-3D-Render A very impractical 3D rendering engine that runs in the python terminal. do NOT try to run this program using the standard python I

23 Dec 31, 2022
Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Official implementation of GOCor This is the official implementation of our paper : GOCor: Bringing Globally Optimized Correspondence Volumes into You

Prune Truong 71 Nov 18, 2022
A PyTorch library for Vision Transformers

VFormer A PyTorch library for Vision Transformers Getting Started Read the contributing guidelines in CONTRIBUTING.rst to learn how to start contribut

Society for Artificial Intelligence and Deep Learning 142 Nov 28, 2022
Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

PLOME:Pre-training with Misspelled Knowledge for Chinese Spelling Correction (ACL2021) This repository provides the code and data of the work in ACL20

197 Nov 26, 2022
LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations

LIMEcraft LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations The LIMEcraft algorithm is an explanatory method based on

MI^2 DataLab 4 Aug 01, 2022
AI pipelines for Nvidia Jetson Platform

Jetson Multicamera Pipelines Easy-to-use realtime CV/AI pipelines for Nvidia Jetson Platform. This project: Builds a typical multi-camera pipeline, i.

NVIDIA AI IOT 96 Dec 23, 2022
Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"

AdderNet: Do We Really Need Multiplications in Deep Learning? This code is a demo of CVPR 2020 paper AdderNet: Do We Really Need Multiplications in De

HUAWEI Noah's Ark Lab 915 Jan 01, 2023
How to Predict Stock Prices Easily Demo

How-to-Predict-Stock-Prices-Easily-Demo How to Predict Stock Prices Easily - Intro to Deep Learning #7 by Siraj Raval on Youtube ##Overview This is th

Siraj Raval 752 Nov 16, 2022
This script runs neural style transfer against the provided content image.

Neural Style Transfer Content Style Output Description: This script runs neural style transfer against the provided content image. The content image m

Martynas Subonis 0 Nov 25, 2021
The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".

Energy-based Conditional Generative Adversarial Network (ECGAN) This is the code for the NeurIPS 2021 paper "A Unified View of cGANs with and without

sianchen 22 May 28, 2022
PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

Yulun Zhang 1.2k Dec 26, 2022
QR2Pass-project - A proof of concept for an alternative (passwordless) authentication system to a web server

QR2Pass This is a proof of concept for an alternative (passwordless) authenticat

4 Dec 09, 2022
Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation [Arxiv] [Paper] As acquiring pixel-wise an

Lukas Hoyer 305 Dec 29, 2022
Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Wonjong Jang 8 Nov 01, 2022