Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"

Overview

DE-DETRs

By Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, and Dacheng Tao

This repository is an official implementation of DE-DETR and DELA-DETR in the paper Towards Data-Efficient Detection Transformers.

For the implementation of DE-CondDETR and DELA-CondDETR, please refer to DE-CondDETR.

Introduction

TL; DR. We identify the data-hungry issue of existing detection transformers and alleviate it by simply alternating how key and value sequences are constructed in the cross-attention layer, with minimum modifications to the original models. Besides, we introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.

DE-DETR

Abstract. Detection Transformers have achieved competitive performance on the sample-rich COCO dataset. However, we show most of them suffer from significant performance drops on small-size datasets, like Cityscapes. In other words, the detection transformers are generally data-hungry. To tackle this problem, we empirically analyze the factors that affect data efficiency, through a step-by-step transition from a data-efficient RCNN variant to the representative DETR. The empirical results suggest that sparse feature sampling from local image areas holds the key. Based on this observation, we alleviate the data-hungry issue of existing detection transformers by simply alternating how key and value sequences are constructed in the cross-attention layer, with minimum modifications to the original models. Besides, we introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency. Experiments show that our method can be readily applied to different detection transformers and improve their performance on both small-size and sample-rich datasets.

Label Augmentation

Main Results

The experimental results and model weights trained on Cityscapes are shown below.

Model Epochs mAP [email protected] [email protected] [email protected] [email protected] [email protected] Log & Model
DETR 300 11.7 26.5 9.3 2.6 9.2 25.6 Google Drive
DE-DETR 50 22.2 41.7 20.5 4.9 19.7 40.8 Google Drive
DELA-DETR 50 25.2 46.8 22.8 6.5 23.8 44.3 Google Drive

The experimental results and model weights trained on COCO 2017 are shown below.

Model Epochs mAP [email protected] [email protected] [email protected] [email protected] [email protected] Log & Model
DETR 50 33.6 54.6 34.2 13.2 35.7 53.5 Google Drive
DE-DETR 50 40.2 60.4 43.2 23.3 42.1 56.4 Google Drive
DELA-DETR 50 41.9 62.6 44.8 24.9 44.9 56.8 Google Drive

Note:

  1. The number of queries is increased from 100 to 300 in DELA-DETR.
  2. The performance of the model weights on Cityscapes is slightly different from that reported in the paper, because the results in the paper are the average of five repeated runs with different random seeds.

Installation

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

  • PyTorch>=1.5.0, torchvision>=0.6.0 (following instructions here)

  • Detectron2>=0.5 for RoIAlign (following instructions here)

  • Other requirements

    pip install -r requirements.txt

Usage

Dataset preparation

The COCO 2017 dataset can be downloaded from here and the Cityscapes datasets can be downloaded from here. The annotations in COCO format can be obtained from here. Afterward, please organize the datasets and annotations as following:

data
└─ cityscapes
   └─ leftImg8bit
      |─ train
      └─ val
└─ coco
   |─ annotations
   |─ train2017
   └─ val2017
└─ CocoFormatAnnos
   |─ cityscapes_train_cocostyle.json
   |─ cityscapes_val_cocostyle.json
   |─ instances_train2017_sample11828.json
   |─ instances_train2017_sample5914.json
   |─ instances_train2017_sample2365.json
   └─ instances_train2017_sample1182.json

The annotations for down-sampled COCO 2017 dataset is generated using utils/downsample_coco.py

Training

Training DELA-DETR on Cityscapes

python -m torch.distributed.launch --nproc_per_node=2 --master_port=29501 --use_env main.py --dataset_file cityscapes --coco_path data/cityscapes --batch_size 4 --model dela-detr --repeat_label 2 --nms --num_queries 300 --wandb

Training DELA-DETR on down-sampled COCO 2017, with e.g. sample_rate=0.01

python -m torch.distributed.launch --nproc_per_node=2 --master_port=29501 --use_env main.py --dataset_file cocodown --coco_path data/coco --sample_rate 0.01 --batch_size 4 --model dela-detr --repeat_label 2 --nms --num_queries 300 --wandb

Training DELA-DETR on COCO 2017

python -m torch.distributed.launch --nproc_per_node=8 --master_port=29501 --use_env main.py --dataset_file coco --coco_path data/coco --batch_size 4 --model dela-detr --repeat_label 2 --nms --num_queries 300 --wandb

Training DE-DETR on Cityscapes

python -m torch.distributed.launch --nproc_per_node=2 --master_port=29501 --use_env main.py --dataset_file cityscapes --coco_path data/cityscapes --batch_size 4 --model de-detr --wandb

Training DETR baseline

Please refer to the detr branch.

Evaluation

You can get the pretrained model (the link is in "Main Results" session), then run following command to evaluate it on the validation set:

<training command> --resume <path to pre-trained model> --eval

Acknowledgement

This project is based on DETR and Deformable DETR. Thanks for their wonderful works. See LICENSE for more details.

Citing DE-DETRs

If you find DE-DETRs useful in your research, please consider citing:

@misc{wang2022towards,
      title={Towards Data-Efficient Detection Transformers}, 
      author={Wen Wang and Jing Zhang and Yang Cao and Yongliang Shen and Dacheng Tao},
      year={2022},
      eprint={2203.09507},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
Wen Wang
Wen Wang
Method for facial emotion recognition compitition of Xunfei and Datawhale .

人脸情绪识别挑战赛-第3名-W03KFgNOc-源代码、模型以及说明文档 队名:W03KFgNOc 排名:3 正确率: 0.75564 队员:yyMoming,xkwang,RichardoMu。 比赛链接:人脸情绪识别挑战赛 文章地址:link emotion 该项目分别训练八个模型并生成csv文

6 Oct 17, 2022
Official PyTorch implementation of BlobGAN: Spatially Disentangled Scene Representations

BlobGAN: Spatially Disentangled Scene Representations Official PyTorch Implementation Paper | Project Page | Video | Interactive Demo BlobGAN.mp4 This

148 Dec 29, 2022
TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

Sayak Paul 67 Dec 20, 2022
The missing CMake project initializer

cmake-init - The missing CMake project initializer Opinionated CMake project initializer to generate CMake projects that are FetchContent ready, separ

1k Jan 01, 2023
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

PatrickStar: Parallel Training of Large Language Models via a Chunk-based Memory Management Meeting PatrickStar Pre-Trained Models (PTM) are becoming

Tencent 633 Dec 28, 2022
New approach to benchmark VQA models

VQA Benchmarking This repository contains the web application & the python interface to evaluate VQA models. Documentation Please see the documentatio

4 Jul 25, 2022
3D2Unet: 3D Deformable Unet for Low-Light Video Enhancement (PRCV2021)

3DDUNET This is the code for 3D2Unet: 3D Deformable Unet for Low-Light Video Enhancement (PRCV2021) Conference Paper Link Dataset We use SMOID dataset

1 Jan 07, 2022
RAMA: Rapid algorithm for multicut problem

RAMA: Rapid algorithm for multicut problem Solves multicut (correlation clustering) problems orders of magnitude faster than CPU based solvers without

Paul Swoboda 60 Dec 13, 2022
minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

rust-mdbg: Minimizer-space de Bruijn graphs (mdBG) for whole-genome assembly rust-mdbg is an ultra-fast minimizer-space de Bruijn graph (mdBG) impleme

Barış Ekim 148 Dec 01, 2022
YoHa - A practical hand tracking engine.

YoHa - A practical hand tracking engine.

2k Jan 06, 2023
scalingscattering

Scaling The Scattering Transform : Deep Hybrid Networks This repository contains the experiments found in the paper: https://arxiv.org/abs/1703.08961

Edouard Oyallon 78 Dec 21, 2022
TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios

TPH-YOLOv5 This repo is the implementation of "TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured

cv516Buaa 439 Dec 22, 2022
[ICML 2020] "When Does Self-Supervision Help Graph Convolutional Networks?" by Yuning You, Tianlong Chen, Zhangyang Wang, Yang Shen

When Does Self-Supervision Help Graph Convolutional Networks? PyTorch implementation for When Does Self-Supervision Help Graph Convolutional Networks?

Shen Lab at Texas A&M University 106 Nov 11, 2022
Boundary-aware Transformers for Skin Lesion Segmentation

Boundary-aware Transformers for Skin Lesion Segmentation Introduction This is an official release of the paper Boundary-aware Transformers for Skin Le

Jiacheng Wang 79 Dec 16, 2022
Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:

MLJAR Automated Machine Learning Documentation: https://supervised.mljar.com/ Source Code: https://github.com/mljar/mljar-supervised Table of Contents

MLJAR 2.4k Dec 31, 2022
Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

V-MPO Simple code to demonstrate Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) in Pyt

Nugroho Dewantoro 9 Jun 06, 2022
Heterogeneous Temporal Graph Neural Network

Heterogeneous Temporal Graph Neural Network This repository contains the datasets and source code of HTGNN. run_mag.ipynb is the training and testing

15 Dec 22, 2022
StocksMA is a package to facilitate access to financial and economic data of Moroccan stocks.

Creating easier access to the Moroccan stock market data What is StocksMA ? StocksMA is a package to facilitate access to financial and economic data

Salah Eddine LABIAD 28 Jan 04, 2023
Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

MaskCycleGAN-VC Unofficial PyTorch implementation of Kaneko et al.'s MaskCycleGAN-VC (2021) for non-parallel voice conversion. MaskCycleGAN-VC is the

86 Dec 25, 2022
Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

alias-free-gan-pytorch Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) This implementation

Kim Seonghyeon 502 Jan 03, 2023