[ICLR 2022] DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

Related tags

Deep LearningDAB-DETR
Overview

DAB-DETR

This is the official pytorch implementation of our ICLR 2022 paper DAB-DETR.

Authors: Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang

News

[2022/4/14] We release the .pptx file of our DETR-like models comparison figure for those who want to draw model arch figures in paper.
[2022/4/12] We fix a bug in the file datasets/coco_eval.py. The parameter useCats of CocoEvaluator should be True by default.
[2022/4/9] Our code is available!
[2022/3/9] We build a repo Awesome Detection Transformer to present papers about transformer for detection and segmenttion. Welcome to your attention!
[2022/3/8] Our new work DINO set a new record of 63.3AP on the MS-COCO leader board.
[2022/3/8] Our new work DN-DETR has been accpted by CVPR 2022!
[2022/1/21] Our work has been accepted to ICLR 2022.

Abstract

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR. This new formulation directly uses box coordinates as queries in Transformer decoders and dynamically updates them layer-by-layer. Using box coordinates not only helps using explicit positional priors to improve the query-to-feature similarity and eliminate the slow training convergence issue in DETR, but also allows us to modulate the positional attention map using the box width and height information. Such a design makes it clear that queries in DETR can be implemented as performing soft ROI pooling layer-by-layer in a cascade manner. As a result, it leads to the best performance on MS-COCO benchmark among the DETR-like detection models under the same setting, e.g., AP 45.7% using ResNet50-DC5 as backbone trained in 50 epochs. We also conducted extensive experiments to confirm our analysis and verify the effectiveness of our methods.

Model

arch

Model Zoo

We provide our models with R50 backbone, including both DAB-DETR and DAB-Deformable-DETR (See Appendix C of our paper for more details).

name backbone box AP Log/Config/Checkpoint Where in Our Paper
0 DAB-DETR-R50 R50 42.2 Google Drive | Tsinghua Cloud Table 2
1 DAB-DETR-R50(3 pat)1 R50 42.6 Google Drive | Tsinghua Cloud Table 2
2 DAB-DETR-R50-DC5 R50 44.5 Google Drive | Tsinghua Cloud Table 2
3 DAB-DETR-R50-DC5-fixxy2 R50 44.7 Google Drive | Tsinghua Cloud Table 8. Appendix H.
4 DAB-DETR-R50-DC5(3 pat) R50 45.7 Google Drive | Tsinghua Cloud Table 2
5 DAB-Deformbale-DETR
(Deformbale Encoder Only)3
R50 46.9 Baseline for DN-DETR
6 DAB-Deformable-DETR-R504 R50 48.1 Google Drive | Tsinghua Cloud Extend Results for Table 5,
Appendix C.

Notes:

  • 1: The models with marks (3 pat) are trained with multiple pattern embeds (refer to Anchor DETR or our paper for more details.).
  • 2: The term "fixxy" means we use random initialization of anchors and do not update their parameters during training (See Appendix H of our paper for more details).
  • 3: The DAB-Deformbale-DETR(Deformbale Encoder Only) is a multiscale version of our DAB-DETR. See DN-DETR for more details.
  • 4: The result here is better than the number in our paper, as we use different losses coefficients during training. Refer to our config file for more details.

Usage

Installation

We use the great DETR project as our codebase, hence no extra dependency is needed for our DAB-DETR. For the DAB-Deformable-DETR, you need to compile the deformable attention operator manually.

We test our models under python=3.7.3,pytorch=1.9.0,cuda=11.1. Other versions might be available as well.

  1. Clone this repo
git clone https://github.com/IDEA-opensource/DAB-DETR.git
cd DAB-DETR
  1. Install Pytorch and torchvision

Follow the instrction on https://pytorch.org/get-started/locally/.

# an example:
conda install -c pytorch pytorch torchvision
  1. Install other needed packages
pip install -r requirements.txt
  1. Compiling CUDA operators
cd models/dab_deformable_detr/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..

Data

Please download COCO 2017 dataset and organize them as following:

COCODIR/
  ├── train2017/
  ├── val2017/
  └── annotations/
  	├── instances_train2017.json
  	└── instances_val2017.json

Run

We use the standard DAB-DETR-R50 and DAB-Deformable-DETR-R50 as examples for training and evalulation.

Eval our pretrianed models

Download our DAB-DETR-R50 model checkpoint from this link and perform the command below. You can expect to get the final AP about 42.2.

For our DAB-Deformable-DETR (download here), the final AP expected is 48.1.

# for dab_detr: 42.2 AP
python main.py -m dab_detr \
  --output_dir logs/DABDETR/R50 \
  --batch_size 1 \
  --coco_path /path/to/your/COCODIR \ # replace the args to your COCO path
  --resume /path/to/our/checkpoint \ # replace the args to your checkpoint path
  --eval

# for dab_deformable_detr: 48.1 AP
python main.py -m dab_deformable_detr \
  --output_dir logs/dab_deformable_detr/R50 \
  --batch_size 2 \
  --coco_path /path/to/your/COCODIR \ # replace the args to your COCO path
  --resume /path/to/our/checkpoint \ # replace the args to your checkpoint path
  --transformer_activation relu \
  --eval

Training your own models

Similarly, you can also train our model on a single process:

# for dab_detr
python main.py -m dab_detr \
  --output_dir logs/DABDETR/R50 \
  --batch_size 1 \
  --epochs 50 \
  --lr_drop 40 \
  --coco_path /path/to/your/COCODIR  # replace the args to your COCO path

Distributed Run

However, as the training is time consuming, we suggest to train the model on multi-device.

If you plan to train the models on a cluster with Slurm, here is an example command for training:

# for dab_detr: 42.2 AP
python run_with_submitit.py \
  --timeout 3000 \
  --job_name DABDETR \
  --coco_path /path/to/your/COCODIR \
  -m dab_detr \
  --job_dir logs/DABDETR/R50_%j \
  --batch_size 2 \
  --ngpus 8 \
  --nodes 1 \
  --epochs 50 \
  --lr_drop 40 

# for dab_deformable_detr: 48.1 AP
python run_with_submitit.py \
  --timeout 3000 \
  --job_name dab_deformable_detr \
  --coco_path /path/to/your/COCODIR \
  -m dab_deformable_detr \
  --transformer_activation relu \
  --job_dir logs/dab_deformable_detr/R50_%j \
  --batch_size 2 \
  --ngpus 8 \
  --nodes 1 \
  --epochs 50 \
  --lr_drop 40 

The final AP should be similar to ours. (42.2 for DAB-DETR and 48.1 for DAB-Deformable-DETR). Our configs and logs(see the model_zoo) could be used as references as well.

Notes:

  • The results are sensitive to the batch size. We use 16(2 images each GPU x 8 GPUs) by default.

Or run with multi-processes on a single node:

# for dab_detr: 42.2 AP
python -m torch.distributed.launch --nproc_per_node=8 \
  main.py -m dab_detr \
  --output_dir logs/DABDETR/R50 \
  --batch_size 2 \
  --epochs 50 \
  --lr_drop 40 \
  --coco_path /path/to/your/COCODIR

# for dab_deformable_detr: 48.1 AP
python -m torch.distributed.launch --nproc_per_node=8 \
  main.py -m dab_deformable_detr \
  --output_dir logs/dab_deformable_detr/R50 \
  --batch_size 2 \
  --epochs 50 \
  --lr_drop 40 \
  --transformer_activation relu \
  --coco_path /path/to/your/COCODIR

Detailed Model

arch

Comparison of DETR-like Models

The source file can be found here.

comparison

Links

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.
Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum
arxiv 2022.
[paper] [code]

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022.
[paper] [code]

License

DAB-DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Copyright (c) IDEA. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use these files except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Citation

@inproceedings{
  liu2022dabdetr,
  title={{DAB}-{DETR}: Dynamic Anchor Boxes are Better Queries for {DETR}},
  author={Shilong Liu and Feng Li and Hao Zhang and Xiao Yang and Xianbiao Qi and Hang Su and Jun Zhu and Lei Zhang},
  booktitle={International Conference on Learning Representations},
  year={2022},
  url={https://openreview.net/forum?id=oMI9PjOb9Jl}
}
[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

RTD-Net (ICCV 2021) This repo holds the codes of paper: "Relaxed Transformer Decoders for Direct Action Proposal Generation", accepted in ICCV 2021. N

Multimedia Computing Group, Nanjing University 80 Nov 30, 2022
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

DiffWave DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. It starts with Gaussian noise and converts it into speech via itera

LMNT 498 Jan 03, 2023
Official implementation of the paper Momentum Capsule Networks (MoCapsNet)

Momentum Capsule Network Official implementation of the paper Momentum Capsule Networks (MoCapsNet). Abstract Capsule networks are a class of neural n

8 Oct 20, 2022
CAPRI: Context-Aware Interpretable Point-of-Interest Recommendation Framework

CAPRI: Context-Aware Interpretable Point-of-Interest Recommendation Framework This repository contains a framework for Recommender Systems (RecSys), a

RecSys Lab 8 Jul 03, 2022
Betafold - AlphaFold with tunings

BetaFold We (hegelab.org) craeted this standalone AlphaFold (AlphaFold-Multimer,

2 Aug 11, 2022
Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

Using an object detection and facial recognition system built on MobileNetSSDV2 and Dlib and running on an NVIDIA Jetson Nano, a GPT-3 model, Google Speech Recognition, Amazon Polly and servo motors,

24 Oct 26, 2022
GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

Guidedog Authors: Kyuhee Jo, Steven Gunarso, Jacky Wang, Raghav Sharma GuideDog is an AI/ML-based mobile app designed to assist the lives of the visua

Kyuhee Jo 5 Nov 24, 2021
Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

nam-pytorch Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al. [abs, pdf] Installation You can access nam-pytorch vi

Rishabh Anand 11 Mar 14, 2022
DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs

DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs Abstract: Image-to-image translation has recently achieved re

yaxingwang 23 Apr 14, 2022
Code and project page for ICCV 2021 paper "DisUnknown: Distilling Unknown Factors for Disentanglement Learning"

DisUnknown: Distilling Unknown Factors for Disentanglement Learning See introduction on our project page Requirements PyTorch = 1.8.0 torch.linalg.ei

Sitao Xiang 24 May 16, 2022
Blind Video Temporal Consistency via Deep Video Prior

deep-video-prior (DVP) Code for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior PyTorch implementation | paper | project web

Chenyang LEI 272 Dec 21, 2022
Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix Tuning Files: . ├── gpt2 # Code for GPT2 style autoregressive LM │ ├── train_e2e.py # high-level script

530 Jan 04, 2023
Isaac Gym Reinforcement Learning Environments

Isaac Gym Reinforcement Learning Environments

NVIDIA Omniverse 714 Jan 08, 2023
A Pytorch implementation of MoveNet from Google. Include training code and pre-train model.

Movenet.Pytorch Intro MoveNet is an ultra fast and accurate model that detects 17 keypoints of a body. This is A Pytorch implementation of MoveNet fro

Mr.Fire 241 Dec 26, 2022
pytorch, hand(object) detect ,yolo v5,手检测

YOLO V5 物体检测,包括手部检测。 项目介绍 手部检测 手部检测示例如下 : 视频示例: 项目配置 作者开发环境: Python 3.7 PyTorch = 1.5.1 数据集 手部检测数据集 该项目数据集采用 TV-Hand 和 COCO-Hand (COCO-Hand-Big 部分) 进

Eric.Lee 11 Dec 20, 2022
Diverse graph algorithms implemented using JGraphT library.

# 1. Installing Maven & Pandas First, please install Java (JDK11) and Python 3 if they are not already. Next, make sure that Maven (for importing J

See Woo Lee 3 Dec 17, 2022
General purpose Slater-Koster tight-binding code for electronic structure calculations

tight-binder Introduction General purpose tight-binding code for electronic structure calculations based on the Slater-Koster approximation. The code

9 Dec 15, 2022
Unofficial PyTorch implementation of TokenLearner by Google AI

tokenlearner-pytorch Unofficial PyTorch implementation of TokenLearner by Ryoo et al. from Google AI (abs, pdf) Installation You can install TokenLear

Rishabh Anand 46 Dec 20, 2022
Solver for Large-Scale Rank-One Semidefinite Relaxations

STRIDE: spectrahedral proximal gradient descent along vertices A Solver for Large-Scale Rank-One Semidefinite Relaxations About STRIDE is designed for

48 Dec 20, 2022
JORLDY an open-source Reinforcement Learning (RL) framework provided by KakaoEnterprise

Repository for Open Source Reinforcement Learning Framework JORLDY

Kakao Enterprise Corp. 330 Dec 30, 2022