A PyTorch version of You Only Look at One-level Feature object detector

Overview

PyTorch_YOLOF

A PyTorch version of You Only Look at One-level Feature object detector.

The input image must be resized to have their shorter side being 800 and their longer side less or equal to 1333.

During reproducing the YOLOF, I found many tricks used in YOLOF but the baseline RetinaNet dosen't use those tricks. For example, YOLOF takes advantage of RandomShift, CTR_CLAMP, large learning rate, big batchsize(like 64), negative prediction threshold. Is it really fair that YOLOF use these tricks to compare with RetinaNet?

In a other word, whether the YOLOF can still work without those tricks?

Requirements

  • We recommend you to use Anaconda to create a conda environment:
conda create -n yolof python=3.6
  • Then, activate the environment:
conda activate yolof
  • Requirements:
pip install -r requirements.txt 

PyTorch >= 1.1.0 and Torchvision >= 0.3.0

Visualize positive sample

You can run following command to visualize positiva sample:

python train.py \
        -d voc \
        --batch_size 2 \
        --root path/to/your/dataset \
        --vis_targets

My Ablation Studies

image mask

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: IoU Top4 (Different from the official matcher that uses top4 of L1 distance.)
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip

We ignore the loss of samples who are not in image.

Method AP AP50 AP75 APs APm APl
w/o mask 28.3 46.7 28.9 13.4 33.4 39.9
w mask 28.4 46.9 29.1 13.5 33.5 39.1

L1 Top4

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip
  • with image mask

IoU topk: We choose the topK of IoU between anchor boxes and labels as the positive samples.

L1 topk: We choose the topK of L1 distance between anchor boxes and labels as the positive samples.

Method AP AP50 AP75 APs APm APl
IoU Top4 28.4 46.9 29.1 13.5 33.5 39.1
L1 Top4 28.6 46.9 29.4 13.8 34.0 39.0

RandomShift Augmentation

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip
  • with image mask

YOLOF takes advantage of RandomShift augmentation which is not used in RetinaNet.

Method AP AP50 AP75 APs APm APl
w/o RandomShift 28.6 46.9 29.4 13.8 34.0 39.0
w/ RandomShift 29.0 47.3 29.8 14.2 34.2 38.9

Fix a bug in dataloader

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

I fixed a bug in dataloader. Specifically, I set the shuffle in dataloader as False ...

Method AP AP50 AP75 APs APm APl
bug 29.0 47.3 29.8 14.2 34.2 38.9
no bug 30.1 49.0 31.0 15.2 36.3 39.8

Ignore samples

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

We ignore those negative samples whose IoU with labels are higher the ignore threshold (igt).

Method AP AP50 AP75 APs APm APl
no igt 30.1 49.0 31.0 15.2 36.3 39.8
igt=0.7

Decode boxes

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

Method-1: ctr_x = x_anchor + t_x, ctr_y = y_anchor + t_y

Method-2: ctr_x = x_anchor + t_x * w_anchor, ctr_y = y_anchor + t_y * h_anchor

The Method-2 is following the operation used in YOLOF.

Method AP AP50 AP75 APs APm APl
Method-1
Method-2

Train

sh train.sh

You can change the configurations of train.sh.

If you just want to check which anchor box is assigned to the positive sample, you can run:

python train.py --cuda -d voc --batch_size 8 --vis_targets

According to your own situation, you can make necessary adjustments to the above run commands

Test

python test.py -d [select a dataset: voc or coco] \
               --cuda \
               -v [select a model] \
               --weight [ Please input the path to model dir. ] \
               --img_size 800 \
               --root path/to/dataset/ \
               --show

You can run the above command to visualize the detection results on the dataset.

You might also like...
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

⚠️ ‎‎‎ A more recent and actively-maintained version of this code is available in ivadomed Stacked Hourglass Network with a Multi-level Attention Mech

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks
implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

YOLOR implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks To reproduce the results in the paper, please us

 You Only 👀 One Sequence
You Only 👀 One Sequence

You Only 👀 One Sequence TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO obje

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR: Detector-Free Local Feature Matching with Transformers Project Page | Paper LoFTR: Detector-Free Local Feature Matching with Transformers Jiami

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

LoFTR-with-train-script LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021 (with train script --- unofficial ---). About Megadepth

A Pytorch Implementation of [Source data‐free domain adaptation of object detector through domain

A Pytorch Implementation of Source data‐free domain adaptation of object detector through domain‐specific perturbation Please follow Faster R-CNN and

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks Please follow Faster R-CNN and DAF to complete the enviro

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Comments
  • fix typo

    fix typo

    When I run the eval process on VOC dataset, an error occurs:

    Traceback (most recent call last):
      File "eval.py", line 126, in <module>
        voc_test(model, data_dir, device, transform)
      File "eval.py", line 42, in voc_test
        display=True)
    TypeError: __init__() got an unexpected keyword argument 'data_root'
    

    I discovered that this was due to a typo and simply fixed it. Everything is going well now.

    opened by guohanli 1
  • 标签生成函数写得有问题

    标签生成函数写得有问题

    源码中的标签生成逻辑是: 1.利用预测框与gt的l1距离筛选出topk个锚点,再利用锚点与gt的l1距离筛选出topk个锚点,将之作为预选正例锚点。 2.将预选正例锚点依据iou与gt匹配,滤除与锚点iou小于0.15的预选正例锚点 3.将gt与预测框iou<=0.7的预测框对应锚点设置为负例锚点 (而您只用了锚点,没有预选,也没用预测框)

    opened by Mr-Z-NewStar 11
Owner
Jianhua Yang
I love anime!!I love ACG!! The universe is so big,I want to fly and wander.
Jianhua Yang
The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

SSL models are Strong UDA learners Introduction This is the official code of paper "Semi-supervised Models are Strong Unsupervised Domain Adaptation L

Yabin Zhang 26 Dec 26, 2022
Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集

English | 简体中文 Latest News 2021.10.25 Paper "Docking-based Virtual Screening with Multi-Task Learning" is accepted by BIBM 2021. 2021.07.29 PaddleHeli

633 Jan 04, 2023
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
Fastshap: A fast, approximate shap kernel

fastshap: A fast, approximate shap kernel fastshap was designed to be: Fast Calculating shap values can take an extremely long time. fastshap utilizes

Samuel Wilson 22 Sep 24, 2022
Recursive Bayesian Networks

Recursive Bayesian Networks This repository contains the code to reproduce the results from the NeurIPS 2021 paper Lieck R, Rohrmeier M (2021) Recursi

Robert Lieck 11 Oct 18, 2022
Additional code for Stable-baselines3 to load and upload models from the Hub.

Hugging Face x Stable-baselines3 A library to load and upload Stable-baselines3 models from the Hub. Installation With pip Examples [Todo: add colab t

Hugging Face 34 Dec 10, 2022
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

Disentangled Representation Learning for Text-Video Retrieval This is a PyTorch implementation of the paper Disentangled Representation Learning for T

Qiang Wang 49 Dec 18, 2022
ScaleNet: A Shallow Architecture for Scale Estimation

ScaleNet: A Shallow Architecture for Scale Estimation Repository for the code of ScaleNet paper: "ScaleNet: A Shallow Architecture for Scale Estimatio

Axel Barroso 34 Nov 09, 2022
Code of our paper "Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning"

CCOP Code of our paper Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning Requirement Install OpenSelfSup Install Detectron2

Chenhongyi Yang 21 Dec 13, 2022
Python library for science observations from the James Webb Space Telescope

JWST Calibration Pipeline JWST requires Python 3.7 or above and a C compiler for dependencies. Linux and MacOS platforms are tested and supported. Win

Space Telescope Science Institute 386 Dec 30, 2022
Let Python optimize the best stop loss and take profits for your TradingView strategy.

TradingView Machine Learning TradeView is a free and open source Trading View bot written in Python. It is designed to support all major exchanges. It

Robert Roman 473 Jan 09, 2023
Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Tom-R.T.Kvalvaag 2 Dec 17, 2021
Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Non-AR Spatial-Temporal Transformer Introduction Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series For

Chen Kai 66 Nov 28, 2022
Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021).

AA-RMVSNet Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021) in PyTorch. paper link: arXiv | CVF Change Log Ju

Qingtian Zhu 97 Dec 30, 2022
Implementation of paper "DCS-Net: Deep Complex Subtractive Neural Network for Monaural Speech Enhancement"

DCS-Net This is the implementation of "DCS-Net: Deep Complex Subtractive Neural Network for Monaural Speech Enhancement" Steps to run the model Edit V

Jack Walters 10 Apr 04, 2022
Adversarial Texture Optimization from RGB-D Scans (CVPR 2020).

AdversarialTexture Adversarial Texture Optimization from RGB-D Scans (CVPR 2020). Scanning Data Download Please refer to data directory for details. B

Jingwei Huang 153 Nov 28, 2022
Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

10 Oct 07, 2022
Neural HMMs are all you need (for high-quality attention-free TTS)

Neural HMMs are all you need (for high-quality attention-free TTS) Shivam Mehta, Éva Székely, Jonas Beskow, and Gustav Eje Henter This is the official

Shivam Mehta 0 Oct 28, 2022
Roger Labbe 13k Dec 29, 2022
An Open-Source Toolkit for Prompt-Learning.

An Open-Source Framework for Prompt-learning. Overview • Installation • How To Use • Docs • Paper • Citation • What's New? Nov 2021: Now we have relea

THUNLP 2.3k Jan 07, 2023