Dynamic Head: Unifying Object Detection Heads with Attentions

Last update: Dec 21, 2022

Related tags

Overview

Dynamic Head: Unifying Object Detection Heads with Attentions

dyhead_video.mp4

This is the official implementation of CVPR 2021 paper "Dynamic Head: Unifying Object Detection Heads with Attentions".

"In this paper, we present a novel dynamic head framework to unify object detection heads with attentions. By coherently combining multiple self-attention mechanisms between feature levels for scale-awareness, among spatial locations for spatial-awareness, and within output channels for task-awareness, the proposed approach significantly improves the representation ability of object detection heads without any computational overhead."

Dynamic Head: Unifying Object Detection Heads With Attentions

Xiyang Dai, Yinpeng Chen, Bin Xiao, Dongdong Chen, Mengchen Liu, Lu Yuan, Lei Zhang

Model Zoo

~~Code and Model are under internal review and will release soon. Stay tuned!~~

In order to open-source, we have ported the implementation from our internal framework to Detectron2 and re-train the models.

We notice better performances on some models compared to original paper.

Config	Model	Backbone	Scheduler	COCO mAP	Weight
cfg	FasterRCNN + DyHead	R50	1x	40.3	weight
cfg	RetinaNet + DyHead	R50	1x	39.9	weight
cfg	ATSS + DyHead	R50	1x	42.4	weight
cfg	ATSS + DyHead	Swin-Tiny	2x + ms	49.8	weight

Usage

Dependencies:

Detectron2, timm

Installation:

python -m pip install -e DynamicHead

Train:

To train a config on a single node with 8 gpus, simply use:

DETECTRON2_DATASETS=$DATASET python train_net.py --config configs/dyhead_r50_retina_fpn_1x.yaml --num-gpus 8

Test:

To test a config with a weight on a single node with 8 gpus, simply use:

DETECTRON2_DATASETS=$DATASET python train_net.py --config configs/dyhead_r50_retina_fpn_1x.yaml --num-gpus 8 --eval-only MODEL.WEIGHTS $WEIGHT

Citation

@InProceedings{Dai_2021_CVPR,
    author    = {Dai, Xiyang and Chen, Yinpeng and Xiao, Bin and Chen, Dongdong and Liu, Mengchen and Yuan, Lu and Zhang, Lei},
    title     = {Dynamic Head: Unifying Object Detection Heads With Attentions},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {7373-7382}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Dynamic Head: Unifying Object Detection Heads with Attentions

Related tags

Overview

Dynamic Head: Unifying Object Detection Heads with Attentions

Model Zoo

Usage

Citation

Contributing

Owner

Microsoft

This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

Neural Fixed-Point Acceleration for Convex Optimization

Code for "Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space"

[NIPS 2021] UOTA: Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration.

Source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Using CNN to mimic the driver based on training data from Torcs

Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).

《LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classiﬁcation》(AAAI 2021) GitHub:

Educational API for 3D Vision using pose to control carton.

Implementation of the paper Scalable Intervention Target Estimation in Linear Models (NeurIPS 2021), and the code to generate simulation results.

Implementation of the method described in the Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Reusable constraint types to use with typing.Annotated

Problem-943.-ACMP - Problem 943. ACMP

This repository provides the official code for GeNER (an automated dataset Generation framework for NER).

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

👨‍💻 run nanosaur in simulation with Gazebo/Ingnition

Code for PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Relighting and Material Editing

The codes I made while I practiced various TensorFlow examples

Code for Multinomial Diffusion