SWA Object Detection

Overview

SWA Object Detection

This project hosts the scripts for training SWA object detectors, as presented in our paper:

@article{zhang2020swa,
  title={SWA Object Detection},
  author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
  journal={arXiv preprint arXiv:2012.12645},
  year={2020}
}

The full paper is available at: https://arxiv.org/abs/2012.12645.

Introduction

Do you want to improve 1.0 AP for your object detector without any inference cost and any change to your detector? Let us tell you such a recipe. It is surprisingly simple: train your detector for an extra 12 epochs using cyclical learning rates and then average these 12 checkpoints as your final detection model. This potent recipe is inspired by Stochastic Weights Averaging (SWA), which is proposed in [1] for improving generalization in deep neural networks. We found it also very effective in object detection. In this work, we systematically investigate the effects of applying SWA to object detection as well as instance segmentation. Through extensive experiments, we discover a good policy of performing SWA in object detection, and we consistently achieve ~1.0 AP improvement over various popular detectors on the challenging COCO benchmark. We hope this work will make more researchers in object detection know this technique and help them train better object detectors.

SWA Object Detection: averaging multiple detection models leads to a better one.

Updates

  • 2020.01.08 Reimplement the code and now it is more convenient, more flexible and easier to perform both the conventional training and SWA training. See Instructions.
  • 2020.01.07 Update to MMDetection v2.8.0.
  • 2020.12.24 Release the code.

Installation

  • This project is based on MMDetection. Therefore the installation is the same as original MMDetection.

  • Please check get_started.md for installation. Note that you should change the version of PyTorch and CUDA to yours when installing mmcv in step 3 and clone this repo instead of MMdetection in step 4.

  • If you run into problems with pycocotools, please install it by:

    pip install "git+https://github.com/open-mmlab/cocoapi.git#subdirectory=pycocotools"
    

Usage of MMDetection

MMDetection provides colab tutorial, and full guidance for quick run with existing dataset and with new dataset for beginners. There are also tutorials for finetuning models, adding new dataset, designing data pipeline, customizing models, customizing runtime settings and useful tools.

Please refer to FAQ for frequently asked questions.

Instructions

We add a SWA training phase to the object detector training process, implement a SWA hook that helps process averaged models, and write a SWA config for conveniently deploying SWA training in training various detectors. We also provide many config files for reproducing the results in the paper.

By including the SWA config in detector config files and setting related parameters, you can have different SWA training modes.

  1. Two-pahse mode. In this mode, the training will begin with the traditional training phase, and it continues for epochs. After that, SWA training will start, with loading the best model on the validation from the previous training phase (becasue swa_load_from = 'best_bbox_mAP.pth'in the SWA config).

    As shown in swa_vfnet_r50 config, the SWA config is included at line 4 and only the SWA optimizer is reset at line 118 in this script. Note that configuring parameters in local scripts will overwrite those values inherited from the SWA config.

    You can change those parameters that are included in the SWA config to use different optimizers or different learning rate schedules for the SWA training. For example, to use a different initial learning rate, say 0.02, you just need to set swa_optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) in the SWA config (global effect) or in the swa_vfnet_r50 config (local effect).

    To start the training, run:

    ./tools/dist_train.sh configs/swa/swa_vfnet_r50_fpn_1x_coco.py 8
    
    
  2. Only-SWA mode. In this mode, the traditional training is skipped and only the SWA training is performed. In general, this mode should work with a pre-trained detection model which you can download from the MMDetection model zoo.

    Have a look at the swa_mask_rcnn_r101 config. By setting only_swa_training = True and swa_load_from = mask_rcnn_pretraind_model, this script conducts only SWA training, starting from a pre-trained detection model. To start the training, run:

    ./tools/dist_train.sh configs/swa/swa_mask_rcnn_r101_fpn_2x_coco.py 8
    
    

In both modes, we have implemented the validation stage and saving functions for the SWA model. Thus, it would be easy to monitor the performance and select the best SWA model.

Results and Models

For your convenience, we provide the following SWA models. These models are obtained by averaging checkpoints that are trained with cyclical learning rates for 12 epochs.

Model bbox AP (val) segm AP (val)     Download    
SWA-MaskRCNN-R50-1x-0.02-0.0002-38.2-34.7 39.1, +0.9 35.5, +0.8 model | config
SWA-MaskRCNN-R101-1x-0.02-0.0002-40.0-36.1 41.0, +1.0 37.0, +0.9 model | config
SWA-MaskRCNN-R101-2x-0.02-0.0002-40.8-36.6 41.7, +0.9 37.4, +0.8 model | config
SWA-FasterRCNN-R50-1x-0.02-0.0002-37.4 38.4, +1.0 - model | config
SWA-FasterRCNN-R101-1x-0.02-0.0002-39.4 40.3, +0.9 - model | config
SWA-FasterRCNN-R101-2x-0.02-0.0002-39.8 40.7, +0.9 - model | config
SWA-RetinaNet-R50-1x-0.01-0.0001-36.5 37.8, +1.3 - model | config
SWA-RetinaNet-R101-1x-0.01-0.0001-38.5 39.7, +1.2 - model | config
SWA-RetinaNet-R101-2x-0.01-0.0001-38.9 40.0, +1.1 - model | config
SWA-FCOS-R50-1x-0.01-0.0001-36.6 38.0, +1.4 - model | config
SWA-FCOS-R101-1x-0.01-0.0001-39.2 40.3, +1.1 - model | config
SWA-FCOS-R101-2x-0.01-0.0001-39.1 40.2, +1.1 - model | config
SWA-YOLOv3(320)-D53-273e-0.001-0.00001-27.9 28.7, +0.8 - model | config
SWA-YOLOv3(680)-D53-273e-0.001-0.00001-33.4 34.2, +0.8 - model | config
SWA-VFNet-R50-1x-0.01-0.0001-41.6 42.8, +1.2 - model | config
SWA-VFNet-R101-1x-0.01-0.0001-43.0 44.3, +1.3 - model | config
SWA-VFNet-R101-2x-0.01-0.0001-43.5 44.5, +1.0 - model | config

Notes:

  • SWA-MaskRCNN-R50-1x-0.02-0.0002-38.2-34.7 means this SWA model is produced based on the pre-trained Mask RCNN model that has a ResNet50 backbone, is trained under 1x schedule with the initial learning rate 0.02 and ending learning rate 0.0002, and achieves 38.2 bbox AP and 34.7 mask AP on the COCO val2017 respectively. This SWA model acheives 39.1 bbox AP and 35.5 mask AP, which are higher than the pre-trained model by 0.9 bbox AP and 0.8 mask AP respectively. This rule applies to other object detectors.

  • In addition to these baseline detectors, SWA can also improve more powerful detectors. One example is VFNetX whose performance on the COCO val2017 is improved from 52.2 AP to 53.4 AP (+1.2 AP).

  • More detailed results including AP50 and AP75 can be found here.

Contributing

Any pull requests or issues are welcome.

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows:

@article{zhang2020swa,
  title={SWA Object Detection},
  author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
  journal={arXiv preprint arXiv:2012.12645},
  year={2020}
}

Acknowledgment

Many thanks to Dr Marlies Hankel and MASSIVE HPC for supporting precious GPU computation resources!

We also would like to thank MMDetection team for producing this great object detection toolbox.

License

This project is released under the Apache 2.0 license.

References

[1] Averaging Weights Leads to Wider Optima and Better Generalization; Pavel Izmailov, Dmitry Podoprikhin, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson; Uncertainty in Artificial Intelligence (UAI), 2018

RM Operation can equivalently convert ResNet to VGG, which is better for pruning; and can help RepVGG perform better when the depth is large.

RMNet: Equivalently Removing Residual Connection from Networks This repository is the official implementation of "RMNet: Equivalently Removing Residua

184 Jan 04, 2023
An imperfect information game is a type of game with asymmetric information

DecisionHoldem An imperfect information game is a type of game with asymmetric information. Compared with perfect information game, imperfect informat

Decision AI 25 Dec 23, 2022
This repo contains the implementation of YOLOv2 in Keras with Tensorflow backend.

Easy training on custom dataset. Various backends (MobileNet and SqueezeNet) supported. A YOLO demo to detect raccoon run entirely in brower is accessible at https://git.io/vF7vI (not on Windows).

Huynh Ngoc Anh 1.7k Dec 24, 2022
以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的斗地主ai

ddz-ai 介绍 斗地主是一种扑克游戏。游戏最少由3个玩家进行,用一副54张牌(连鬼牌),其中一方为地主,其余两家为另一方,双方对战,先出完牌的一方获胜。 ddz-ai以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的系统,使其经过大量训练后,能在实际游戏中获

freefuiiismyname 88 May 15, 2022
Unofficial PyTorch Implementation of "Augmenting Convolutional networks with attention-based aggregation"

Pytorch Implementation of Augmenting Convolutional networks with attention-based aggregation This is the unofficial PyTorch Implementation of "Augment

DK 20 Sep 09, 2022
[NeurIPS 2020] Official repository for the project "Listening to Sound of Silence for Speech Denoising"

Listening to Sounds of Silence for Speech Denoising Introduction This is the repository of the "Listening to Sounds of Silence for Speech Denoising" p

Henry Xu 40 Dec 20, 2022
Imagededup - 😎 Finding duplicate images made easy

imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.

idealo 4.3k Jan 07, 2023
This repository contains all source code, pre-trained models related to the paper "An Empirical Study on GANs with Margin Cosine Loss and Relativistic Discriminator"

An Empirical Study on GANs with Margin Cosine Loss and Relativistic Discriminator This is a Pytorch implementation for the paper "An Empirical Study o

Cuong Nguyen 3 Nov 15, 2021
Face recognize system

FRS Face_recognize_system This project contains my work that target on solving some problems of FRS: Face detection: Retinaface Face anti-spoofing: Fo

Tran Anh Tuan 4 Nov 18, 2021
Guided Internet-delivered Cognitive Behavioral Therapy Adherence Forecasting

Guided Internet-delivered Cognitive Behavioral Therapy Adherence Forecasting #Dataset The folder "Dataset" contains the dataset use in this work and m

0 Jan 08, 2022
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

This is a release of our VIMPAC paper to illustrate the implementations. The pretrained checkpoints and scripts will be soon open-sourced in HuggingFace transformers.

Hao Tan 74 Dec 03, 2022
Hand-distance-measurement-game - Hand Distance Measurement Game

Hand Distance Measurement Game This is program is made to calculate the distance

Priyansh 2 Jan 12, 2022
Code for EMNLP2021 paper "Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training"

VoCapXLM Code for EMNLP2021 paper Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training Environment DockerFile: dancingso

Bo Zheng 15 Jul 28, 2022
git《USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation》(2020) GitHub: [fig2]

USD-Seg This project is an implement of paper USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation, based on FCOS detector f

Ruolin Ye 80 Nov 28, 2022
Algorithmic trading with deep learning experiments

Deep-Trading Algorithmic trading with deep learning experiments. Now released part one - simple time series forecasting. I plan to implement more soph

Alex Honchar 1.4k Jan 02, 2023
《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

42 Nov 24, 2022
Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System

Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System This repository contains code for the paper Schultheis,

2 Oct 28, 2022
The code for replicating the experiments from the LFI in SSMs with Unknown Dynamics paper.

Likelihood-Free Inference in State-Space Models with Unknown Dynamics This package contains the codes required to run the experiments in the paper. Th

Alex Aushev 0 Dec 27, 2021
HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

HNECV This repository provides a reference implementation of HNECV as described in the paper: HNECV: Heterogeneous Network Embedding via Cloud model a

4 Jun 28, 2022
A Python script that creates subtitles of a given length from text paragraphs that can be easily imported into any Video Editing software such as FinalCut Pro for further adjustments.

Text to Subtitles - Python This python file creates subtitles of a given length from text paragraphs that can be easily imported into any Video Editin

Dmytro North 9 Dec 24, 2022