Implementation of ICCV21 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers

Last update: Dec 27, 2022

Related tags

Overview

Implementation of ICCV 2021 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers arxiv

This repository is based on detr

Recently, DETR pioneered the solution of vision tasks with transformers, it directly translates the image feature map into the object detection result. Though effective, translating the full feature map can be costly due to redundant computation on some area like the background. In this work, we encapsulate the idea of reducing spatial redundancy into a novel poll and pool (PnP) sampling module, with which we build an end-to-end PnP-DETR architecture that adaptively allocates its computation spatially to be more efficient. Concretely, the PnP module abstracts the image feature map into fine foreground object feature vectors and a small number of coarse background contextual feature vectors. The transformer models information interaction within the fine-coarse feature space and translates the features into the detection result. Moreover, the PnP-augmented model can instantly achieve various desired trade-offs between performance and computation with a single model by varying the sampled feature length, without requiring to train multiple models as existing methods. Thus it offers greater flexibility for deployment in diverse scenarios with varying computation constraint. We further validate the generalizability of the PnP module on panoptic segmentation and the recent transformer-based image recognition model ViT and show consistent efficiency gain. We believe our method makes a step for efficient visual analysis with transformers, wherein spatial redundancy is commonly observed.

Usage

First, clone the repository locally:

git clone https://github.com/twangnh/pnp-detr

Then, install PyTorch 1.5+ and torchvision 0.6+:

conda install -c pytorch pytorch torchvision

Install pycocotools (for evaluation on COCO) and scipy (for training):

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

That's it, should be good to train and evaluate detection models.

(optional) to work with panoptic install panopticapi:

pip install git+https://github.com/cocodataset/panopticapi.git

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

To train baseline DETR on a single node with 8 gpus for 300 epochs run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco

Following DETR, We train PnP-DETR with AdamW setting learning rate in the transformer to 1e-4 and 1e-5 in the backbone. Horizontal flips, scales an crops are used for augmentation. Images are rescaled to have min size 800 and max size 1333. The transformer is trained with dropout of 0.1, and the whole model is trained with grad clip of 0.1.

Evaluation

To evaluate DETR R50 on COCO val5k with a single GPU run:

python main.py --batch_size 2 --no_aux_loss --eval --resume xxx --coco_path /path/to/coco

Multinode training

Distributed training is available via Slurm and submitit:

pip install submitit

Train baseline DETR-6-6 model on 4 nodes for 300 epochs:

python run_with_submitit.py --timeout 3000 --coco_path /path/to/coco

License

DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Implementation of ICCV21 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers

Related tags

Overview

Implementation of ICCV 2021 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers arxiv

Usage

Data preparation

Training

Evaluation

Multinode training

License

Owner

twang

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

StorSeismic: An approach to pre-train a neural network to store seismic data features

The codes and related files to reproduce the results for Image Similarity Challenge Track 2.

Source Code for Simulations in the Publication "Can the brain use waves to solve planning problems?"

FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations"

Unifying Global-Local Representations in Salient Object Detection with Transformer

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Dataset and Code for the paper "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021), and "Depth-only Object Tracking" (BMVC2021)

An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym

A simple but complete full-attention transformer with a set of promising experimental features from various papers

Pytorch implementation of One-Shot Affordance Detection

OpenVisionAPI server

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

Real-Time High-Resolution Background Matting

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

PyTorch implementation of "PatchGame: Learning to Signal Mid-level Patches in Referential Games" to appear in NeurIPS 2021

a curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.

Implementation of ICCV21 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers

Related tags

Overview

Implementation of ICCV 2021 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers arxiv

Usage

Data preparation

Training

Evaluation

Multinode training

License

Owner

twang

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

StorSeismic: An approach to pre-train a neural network to store seismic data features

The codes and related files to reproduce the results for Image Similarity Challenge Track 2.

Source Code for Simulations in the Publication "Can the brain use waves to solve planning problems?"

FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations"

Unifying Global-Local Representations in Salient Object Detection with Transformer

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Dataset and Code for the paper "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021), and "Depth-only Object Tracking" (BMVC2021)

An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym

A simple but complete full-attention transformer with a set of promising experimental features from various papers

Pytorch implementation of One-Shot Affordance Detection

OpenVisionAPI server

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

Real-Time High-Resolution Background Matting

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

PyTorch implementation of "PatchGame: Learning to Signal Mid-level Patches in Referential Games" to appear in NeurIPS 2021

a curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang