Fast convergence of detr with spatially modulated co-attention

Last update: Dec 07, 2022

Related tags

Overview

Fast convergence of detr with spatially modulated co-attention

Usage

There are no extra compiled components in SMCA DETR and package dependencies are minimal, so the code is very simple to use. We provide instructions how to install dependencies via conda. First, clone the repository locally:

git clone https://github.com/facebookresearch/detr.git

Then, install PyTorch 1.5+ and torchvision 0.6+:

conda install -c pytorch pytorch torchvision

Install pycocotools (for evaluation on COCO) and scipy (for training):

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

That's it, should be good to train and evaluate detection models.

(optional) to work with panoptic install panopticapi:

pip install git+https://github.com/cocodataset/panopticapi.git

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

To train Single Scale SMCA on a single node with 8 gpus for 300 epochs run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 2 --lr_drop 40 --num_queries 300 --epochs 50 --dynamic_scale type3 --output_dir smca_single_scale

A single epoch takes 30 minutes, so 50 epoch training takes around 25 hours on a single machine with 8 V100 cards.

Object Detection

Model Zoo

	name	dataset	backbone	schedule	box AP
0	SMCA(single scale)	MSCOCO	R50	50	41.0
1	SMCA-Container(single scale)	MSCOCO	Container-S-Light	50	44.2
2	SMCA-Container(single scale)	MSCOCO	Container-M	50	47.3
3	SMCA(single scale)	MSCOCO	R50	108	42.7
4	SMCA(single scale)	MSCOCO	R50	250	43.5
5	SMCA(multi scale)	MSCOCO	R50	50	43.7
6	SMCA(New multi scale)	MSCOCO	R50	50	44.4
7	SMCA	Visual Genome	R50	50	coming soon

Panoptic Segmentation

Model Zoo

	name	dataset	backbone	schedule	PQ	SQ	RQ
1	MASK-Former(single scale)	MSCOCO	R50	500	46.5	80.4	56.8
2	SMCA-MASK-Former(single scale)	MSCOCO	R50	50	46.0	80.4	56.0

## Original SMCA code submission during ICCV review period. https://github.com/abc403/SMCA-replication

Release Steps

Single-scale SMCA
Single-scale SMCA with Container-Small
Single-scale SMCA with Container-Medium
New Multi-scale SMCA (Newly added Multi_scale_SMCA.zip, 9th Sep)
SMCA-DETR for Fast Convergence of Panoptic Segmentation

Citation

If you find this repository useful, please consider citing our work:

@article{gao2021fast,
  title={Fast convergence of detr with spatially modulated co-attention},
  author={Gao, Peng and Zheng, Minghang and Wang, Xiaogang and Dai, Jifeng and Li, Hongsheng},
  journal={arXiv preprint arXiv:2101.07448},
  year={2021}
}

@article{gao2021container,
  title={Container: Context Aggregation Network},
  author={Gao, Peng and Lu, Jiasen and Li, Hongsheng and Mottaghi, Roozbeh and Kembhavi, Aniruddha},
  journal={arXiv preprint arXiv:2106.01401},
  year={2021}
}

@article{zheng2020end,
  title={End-to-end object detection with adaptive clustering transformer},
  author={Zheng, Minghang and Gao, Peng and Wang, Xiaogang and Li, Hongsheng and Dong, Hao},
  journal={arXiv preprint arXiv:2011.09315},
  year={2020}
}

Contributor

Peng Gao, Qiu Han, Minghang Zeng

Acknowledege

The project are borrowed heavily from DETR. Partially motivated by Sparse RCNN.

Fast convergence of detr with spatially modulated co-attention

Related tags

Overview

Fast convergence of detr with spatially modulated co-attention

Usage

Data preparation

Training

Object Detection

Model Zoo

Panoptic Segmentation

Model Zoo

Release Steps

Citation

Contributor

Acknowledege

Owner

peng gao

PyTorch implementation of our paper: Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

The end-to-end platform for building voice products at scale

DeepAL: Deep Active Learning in Python

Codes for “A Deeply Supervised Attention Metric-Based Network and an Open Aerial Image Dataset for Remote Sensing Change Detection”

Cobalt Strike teamserver detection.

This repo provides function call to track multi-objects in videos

WORD: Revisiting Organs Segmentation in the Whole Abdominal Region

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

codes for Self-paced Deep Regression Forests with Consideration on Ranking Fairness

TensorFlow implementation of Adaptive Information Transfer Multi-task (AITM) framework. Code for the paper submitted to KDD21: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning for Customer Acquisition.

Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

using STGCN to achieve egg classification task

Doosan robotic arm, simulation, control, visualization in Gazebo and ROS2 for Reinforcement Learning.

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

In this project, two programs can help you take full agvantage of time on the model training with a remote server

PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR)

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

Fast convergence of detr with spatially modulated co-attention

Related tags

Overview

Fast convergence of detr with spatially modulated co-attention

Usage

Data preparation

Training

Object Detection

Model Zoo

Panoptic Segmentation

Model Zoo

Release Steps

Citation

Contributor

Acknowledege

Owner

peng gao

PyTorch implementation of our paper: Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

The end-to-end platform for building voice products at scale

DeepAL: Deep Active Learning in Python

Codes for “A Deeply Supervised Attention Metric-Based Network and an Open Aerial Image Dataset for Remote Sensing Change Detection”

Cobalt Strike teamserver detection.

This repo provides function call to track multi-objects in videos

WORD: Revisiting Organs Segmentation in the Whole Abdominal Region

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

codes for Self-paced Deep Regression Forests with Consideration on Ranking Fairness

TensorFlow implementation of Adaptive Information Transfer Multi-task (AITM) framework. Code for the paper submitted to KDD21: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning for Customer Acquisition.

Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

using STGCN to achieve egg classification task

Doosan robotic arm, simulation, control, visualization in Gazebo and ROS2 for Reinforcement Learning.

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

In this project, two programs can help you take full agvantage of time on the model training with a remote server

PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR)

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务