An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Related tags

Deep LearningSFA
Overview

Sequence Feature Alignment (SFA)

By Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-jun Zha, Yonggang Wen, and Dacheng Tao

This repository is an official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers, which is accepted to ACM MultiMedia 2021.

Introduction

TL; DR. We develop a domain adaptive object detection method SFA that is specialized for adaptive detection transformers. It contains a domain query-based feature alignment model and a token-wise feature alignment module for global and local feature alignment respectively, and a bipartite matching consistency loss for improving robustness.

SFA

Abstract. Detection transformers have recently shown promising object detection results and attracted increasing attention. However, how to develop effective domain adaptation techniques to improve its cross-domain performance remains unexplored and unclear. In this paper, we delve into this topic and empirically find that direct feature distribution alignment on the CNN backbone only brings limited improvements, as it does not guarantee domain-invariant sequence features in the transformer for prediction. To address this issue, we propose a novel Sequence Feature Alignment (SFA) method that is specially designed for the adaptation of detection transformers. Technically, SFA consists of a domain query-based feature alignment (DQFA) module and a token-wise feature alignment (TDA) module. In DQFA, a novel domain query is used to aggregate and align global context from the token sequence of both domains. DQFA reduces the domain discrepancy in global feature representations and object relations when deploying in the transformer encoder and decoder, respectively. Meanwhile, TDA aligns token features in the sequence from both domains, which reduces the domain gaps in local and instance-level feature representations in the transformer encoder and decoder, respectively. Besides, a novel bipartite matching consistency loss is proposed to enhance the feature discriminability for robust object detection. Experiments on three challenging benchmarks show that SFA outperforms state-of-the-art domain adaptive object detection methods.

Main Results

The experimental results and model weights for Cityscapes to Foggy Cityscapes are shown below.

Model mAP [email protected] [email protected] [email protected] [email protected] [email protected] Log & Model
SFA-DefDETR 21.5 41.1 20.0 3.9 20.9 43.0 Google Drive
SFA-DefDETR-BoxRefine 23.9 42.6 22.5 3.8 21.6 46.7 Google Drive
SFA-DefDETR-TwoStage 24.1 42.5 22.8 3.8 22.0 48.1 Google Drive

Note:

  1. All models of SFA are trained with total batch size of 4.
  2. "DefDETR" means Deformable DETR (with R50 backbone).
  3. "BoxRefine" means Deformable DETR with iterative box refinement.
  4. "TwoStage" indicates the two-stage Deformable DETR variant.
  5. The original implementation is based on our internal codebase. There are slight differences in the released code are slight differences. For example, we only use the middle features output by the first encoder and decoder layers for hierarchical feature alignment, to reduce computational costs during training.

Installation

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    We recommend you to use Anaconda to create a conda environment:

    conda create -n sfa python=3.7 pip

    Then, activate the environment:

    conda activate sfa
  • PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)

    For example, if your CUDA version is 9.2, you could install pytorch and torchvision as following:

    conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch
  • Other requirements

    pip install -r requirements/requirements.txt
  • Logging using wandb (optional)

    pip install -r requirements/optional.txt

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Usage

Dataset preparation

We use the preparation of Cityscapes to Foggy Cityscapes adaptation as demonstration. Other domain adaptation benchmarks can be prepared in analog. Cityscapes and Foggy Cityscapes datasets can be downloaded from here. The annotations in COCO format can be obtained from here. Afterward, please organize the datasets and annotations as following:

[coco_path]
└─ cityscapes
   └─ leftImg8bit
      └─ train
      └─ val
└─ foggy_cityscapes
   └─ leftImg8bit_foggy
      └─ train
      └─ val
└─ CocoFormatAnnos
   └─ cityscapes_train_cocostyle.json
   └─ cityscapes_foggy_train_cocostyle.json
   └─ cityscapes_foggy_val_cocostyle.json

Training

As an example, we provide commands for training our SFA on a single node with 4 GPUs for weather adaptation.

Training SFA-DeformableDETR

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs_da/sfa_r50_deformable_detr.sh --wandb

Training SFA-DeformableDETR-BoxRefine

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs_da/sfa_r50_deformable_detr_plus_iterative_bbox_refinement.sh --wandb

Training SFA-DeformableDETR-TwoStage

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs_da/sfa_r50_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage.sh --wandb

Training Source-only DeformableDETR

Please refer to the source branch.

Evaluation

You can get the config file and pretrained model of SFA (the link is in "Main Results" session), then run following command to evaluate it on Foggy Cityscapes validation set:

<path to config file> --resume <path to pre-trained model> --eval

You can also run distributed evaluation by using ./tools/run_dist_launch.sh or ./tools/run_dist_slurm.sh.

Acknowledgement

This project is based on DETR and Deformable DETR. Thanks for their wonderful works. See LICENSE for more details.

Citing SFA

If you find SFA useful in your research, please consider citing:

@inproceedings{wang2021exploring ,
  title={Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers},
  author={Wen, Wang and Yang, Cao and Jing, Zhang and Fengxiang, He and Zheng-Jun, Zha and Yonggang, Wen and Dacheng, Tao},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}
Owner
WangWen
WangWen
Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems

Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems This is our experimental code for RecSys 2021 paper "Learning

11 Jul 28, 2022
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion Yinghao Aaron Li, Ali Zare, Nima Mesgarani We pres

Aaron (Yinghao) Li 282 Jan 01, 2023
Offical code for the paper: "Growing 3D Artefacts and Functional Machines with Neural Cellular Automata" https://arxiv.org/abs/2103.08737

Growing 3D Artefacts and Functional Machines with Neural Cellular Automata Video of more results: https://www.youtube.com/watch?v=-EzztzKoPeo Requirem

Robotics Evolution and Art Lab 51 Jan 01, 2023
code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation (CVPR 2021) Introduction PBR is a conceptually simple yet effective

H.Chen 143 Jan 05, 2023
Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)

Duong H. Le 18 Jun 13, 2022
Improving Object Detection by Label Assignment Distillation

Improving Object Detection by Label Assignment Distillation This is the official implementation of the WACV 2022 paper Improving Object Detection by L

Cybercore Co. Ltd 51 Dec 08, 2022
This is a model made out of Neural Network specifically a Convolutional Neural Network model

This is a model made out of Neural Network specifically a Convolutional Neural Network model. This was done with a pre-built dataset from the tensorflow and keras packages. There are other alternativ

9 Oct 18, 2022
Create time-series datacubes for supervised machine learning with ICEYE SAR images.

ICEcube is a Python library intended to help organize SAR images and annotations for supervised machine learning applications. The library generates m

ICEYE Ltd 65 Jan 03, 2023
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP] Unofficial Pytorch implementation of AdaSpeech 2. Requirements : All code written i

Rishikesh (ऋषिकेश) 63 Dec 28, 2022
Official Repository for "Robust On-Policy Data Collection for Data Efficient Policy Evaluation" (NeurIPS 2021 Workshop on OfflineRL).

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation Source code of Robust On-Policy Data Collection for Data-Efficient Policy Evalua

Autonomous Agents Research Group (University of Edinburgh) 2 Oct 09, 2022
Simple Baselines for Human Pose Estimation and Tracking

Simple Baselines for Human Pose Estimation and Tracking News Our new work High-Resolution Representations for Labeling Pixels and Regions is available

Microsoft 2.7k Jan 05, 2023
BridgeGAN - Tensorflow implementation of Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation.

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021) Tensorflow implementation of Bridging the Gap between Label- and Reference-ba

huangqiusheng 8 Jul 13, 2022
[ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets"

EarlyBERT This is the official implementation for the paper in ACL-IJCNLP 2021 "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets" by

VITA 13 May 11, 2022
Stochastic Normalizing Flows

Stochastic Normalizing Flows We introduce stochasticity in Boltzmann-generating flows. Normalizing flows are exact-probability generative models that

AI4Science group, FU Berlin (Frank Noé and co-workers) 50 Dec 16, 2022
Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

Lunar Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs. About Lunar can be modified to work

Zeyad Mansour 276 Jan 07, 2023
High-Resolution Image Synthesis with Latent Diffusion Models

Latent Diffusion Models arXiv | BibTeX High-Resolution Image Synthesis with Latent Diffusion Models Robin Rombach*, Andreas Blattmann*, Dominik Lorenz

CompVis Heidelberg 5.6k Dec 30, 2022
Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

ProGen - (wip) Implementation and replication of ProGen, Language Modeling for Protein Generation, in Pytorch and Jax (the weights will be made easily

Phil Wang 71 Dec 01, 2022
Code for Ditto: Building Digital Twins of Articulated Objects from Interaction

Ditto: Building Digital Twins of Articulated Objects from Interaction Zhenyu Jiang, Cheng-Chun Hsu, Yuke Zhu CVPR 2022, Oral Project | arxiv News 2022

UT Robot Perception and Learning Lab 78 Dec 22, 2022
DeLiGAN - This project is an implementation of the Generative Adversarial Network

This project is an implementation of the Generative Adversarial Network proposed in our CVPR 2017 paper - DeLiGAN : Generative Adversarial Net

Video Analytics Lab -- IISc 110 Sep 13, 2022
Interactive Image Segmentation via Backpropagating Refinement Scheme

Won-Dong Jang and Chang-Su Kim, Interactive Image Segmentation via Backpropagating Refinement Scheme, CVPR 2019

Won-Dong Jang 85 Sep 15, 2022