[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Related tags

Deep Learningup-detr
Overview

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

This is the official PyTorch implementation and models for UP-DETR paper:

@article{dai2020up-detr,
  author  = {Zhigang Dai and Bolun Cai and Yugeng Lin and Junying Chen},
  title   = {UP-DETR: Unsupervised Pre-training for Object Detection with Transformers},
  journal = {arXiv preprint arXiv:2011.09094},
  year    = {2020},
}

In UP-DETR, we introduce a novel pretext named random query patch detection to pre-train transformers for object detection. UP-DETR inherits from DETR with the same ResNet-50 backbone, same Transformer encoder, decoder and same codebase. With unsupervised pre-training CNN, the whole UP-DETR model doesn't require any human annotations. UP-DETR achieves 43.1 AP on COCO with 300 epochs fine-tuning. The AP of open-source version is a little higher than paper report.

UP-DETR

Model Zoo

We provide pre-training UP-DETR and fine-tuning UP-DETR models on COCO, and plan to include more in future. The evaluation metric is same to DETR.

Here is the UP-DETR model pre-trained on ImageNet without labels. The CNN weight is initialized from SwAV, which is fixed during the transformer pre-training:

name backbone epochs url size md5
UP-DETR R50 (SwAV) 60 model | logs 164Mb 49f01f8b

Comparision with DETR:

name backbone (pre-train) epochs box AP url size
DETR R50 (Supervised) 500 42.0 - 159Mb
DETR R50 (SwAV) 300 42.1 - 159Mb
UP-DETR R50 (SwAV) 300 43.1 model | logs 159Mb

COCO val5k evaluation results of UP-DETR can be found in this gist.

Usage - Object Detection

There are no extra compiled components in UP-DETR and package dependencies are same to DETR. We provide instructions how to install dependencies via conda:

git clone tbd
conda install -c pytorch pytorch torchvision
conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

UP-DETR follows two steps: pre-training and fine-tuning. We present the model pre-trained on ImageNet and then fine-tuned on COCO.

Unsupervised Pre-training

Data Preparation

Download and extract ILSVRC2012 train dataset.

We expect the directory structure to be the following:

path/to/imagenet/
  n06785654/  # caterogey directory
    n06785654_16140.JPEG # images
  n04584207/  # caterogey directory
    n04584207_14322.JPEG # images

Images can be organized disorderly because our pre-training is unsupervised.

Pre-training

To pr-train UP-DETR on a single node with 8 gpus for 60 epochs, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
    --lr_drop 40 \
    --epochs 60 \
    --pre_norm \
    --num_patches 10 \
    --batch_size 32 \
    --feature_recon \
    --fre_cnn \
    --imagenet_path path/to/imagenet \
    --output_dir path/to/save_model

As the size of pre-training images is relative small, so we can set a large batch size.

It takes about 2 hours for a epoch, so 60 epochs pre-training takes about 5 days with 8 V100 gpus.

In our further ablation experiment, we found that object query shuffle is not helpful. So, we remove it in the open-source version.

Fine-tuning

Data Preparation

Download and extract COCO 2017 dataset train and val dataset.

The directory structure is expected as follows:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Fine-tuning

To fine-tune UP-DETR with 8 gpus for 300 epochs, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env detr_main.py \
    --lr_drop 200 \
    --epochs 300 \
    --lr_backbone 5e-4 \
    --pre_norm \
    --coco_path path/to/coco \
    --pretrain path/to/save_model/checkpoint.pth

The fine-tuning cost is exactly same to DETR, which takes 28 minutes with 8 V100 gpus. So, 300 epochs training takes about 6 days.

The model can also extended to panoptic segmentation, checking more details on DETR.

Notebook

We provide a notebook in colab to get the visualization result in the paper:

  • Visualization Notebook: This notebook shows how to perform query patch detection with the pre-training model (without any annotations fine-tuning).

vis

License

UP-DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Owner
dddzg
MSc student at SCUT
dddzg
Encoding Causal Macrovariables

Encoding Causal Macrovariables Data Natural climate data ('El Nino') Self-generated data ('Simulated') Experiments Detecting macrovariables through th

Benedikt Höltgen 3 Jul 31, 2022
A system for quickly generating training data with weak supervision

Programmatically Build and Manage Training Data Announcement The Snorkel team is now focusing their efforts on Snorkel Flow, an end-to-end AI applicat

Snorkel Team 5.4k Jan 02, 2023
[CVPR 2021] "Multimodal Motion Prediction with Stacked Transformers": official code implementation and project page.

mmTransformer Introduction This repo is official implementation for mmTransformer in pytorch. Currently, the core code of mmTransformer is implemented

DeciForce: Crossroads of Machine Perception and Autonomy 232 Dec 31, 2022
Locationinfo - A script helps the user to show network information such as ip address

Description This script helps the user to show network information such as ip ad

Roxcoder 1 Dec 30, 2021
ProjectOxford-ClientSDK - This repo has moved :house: Visit our website for the latest SDKs & Samples

This project has moved 🏠 We heard your feedback! This repo has been deprecated and each project has moved to a new home in a repo scoped by API and p

Microsoft 970 Nov 28, 2022
General-purpose program synthesiser

DeepSynth General-purpose program synthesiser. This is the repository for the code of the paper "Scaling Neural Program Synthesis with Distribution-ba

Nathanaël Fijalkow 24 Oct 23, 2022
A library for uncertainty quantification based on PyTorch

Torchuq [logo here] TorchUQ is an extensive library for uncertainty quantification (UQ) based on pytorch. TorchUQ currently supports 10 representation

TorchUQ 96 Dec 12, 2022
Dahua Camera and Doorbell Home Assistant Integration

Home Assistant Dahua Integration The Dahua Home Assistant integration allows you to integrate your Dahua cameras and doorbells in Home Assistant. It's

Ronnie 216 Dec 26, 2022
A script depending on VASP output for calculating Fermi-Softness.

Fermi softness calculation for Vienna Ab initio Simulation Package (VASP) Update 1.1.0: Big update: Rewrote the code. Use Bader atomic division instea

qslin 11 Nov 08, 2022
A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning

LABES This is the code for EMNLP 2020 paper "A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised L

17 Sep 28, 2022
Speech Recognition using DeepSpeech2.

deepspeech.pytorch Implementation of DeepSpeech2 for PyTorch using PyTorch Lightning. The repo supports training/testing and inference using the DeepS

Sean Naren 2k Jan 04, 2023
Ranking Models in Unlabeled New Environments (iccv21)

Ranking Models in Unlabeled New Environments Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch 1.7.0 + torchivision 0.8.1

14 Dec 17, 2021
Square Root Bundle Adjustment for Large-Scale Reconstruction

RootBA: Square Root Bundle Adjustment Project Page | Paper | Poster | Video | Code Table of Contents Citation Dependencies Installing dependencies on

Nikolaus Demmel 205 Dec 20, 2022
DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

[ICLR'21] DARTS-: Robustly Stepping out of Performance Collapse Without Indicators [openreview] Authors: Xiangxiang Chu, Xiaoxing Wang, Bo Zhang, Shun

55 Nov 01, 2022
Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation in TensorFlow 2

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation in TensorFlow 2 Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexan

Phan Nguyen 1 Dec 16, 2021
[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization This is the PyTorch implemention of our paper FedBN: Federated Learning on

<a href=[email protected]"> 156 Dec 15, 2022
shufflev2-yolov5:lighter, faster and easier to deploy

shufflev2-yolov5: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size

pogg 1.5k Jan 05, 2023
SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

The SLIDE package contains the source code for reproducing the main experiments in this paper. Dataset The Datasets can be downloaded in Amazon-

Intel Labs 72 Dec 16, 2022
STARCH compuets regional extreme storm physical characteristics and moisture balance based on spatiotemporal precipitation data from reanalysis or climate model data.

STARCH (Storm Tracking And Regional CHaracterization) STARCH computes regional extreme storm physical and moisture balance characteristics based on sp

Onosama 7 Oct 20, 2022
Scaling Vision with Sparse Mixture of Experts

Scaling Vision with Sparse Mixture of Experts This repository contains the code for training and fine-tuning Sparse MoE models for vision (V-MoE) on I

Google Research 290 Dec 25, 2022