A PyTorch version of You Only Look at One-level Feature object detector

Last update: Dec 30, 2022

Related tags

Overview

PyTorch_YOLOF

A PyTorch version of You Only Look at One-level Feature object detector.

The input image must be resized to have their shorter side being 800 and their longer side less or equal to 1333.

During reproducing the YOLOF, I found many tricks used in YOLOF but the baseline RetinaNet dosen't use those tricks. For example, YOLOF takes advantage of RandomShift, CTR_CLAMP, large learning rate, big batchsize(like 64), negative prediction threshold. Is it really fair that YOLOF use these tricks to compare with RetinaNet?

In a other word, whether the YOLOF can still work without those tricks?

Requirements

We recommend you to use Anaconda to create a conda environment:

conda create -n yolof python=3.6

Then, activate the environment:

conda activate yolof

Requirements:

pip install -r requirements.txt

PyTorch >= 1.1.0 and Torchvision >= 0.3.0

Visualize positive sample

You can run following command to visualize positiva sample:

python train.py \
        -d voc \
        --batch_size 2 \
        --root path/to/your/dataset \
        --vis_targets

My Ablation Studies

image mask

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: IoU Top4 (Different from the official matcher that uses top4 of L1 distance.)
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip

We ignore the loss of samples who are not in image.

Method	AP	AP50	AP75	APs	APm	APl
w/o mask	28.3	46.7	28.9	13.4	33.4	39.9
w mask	28.4	46.9	29.1	13.5	33.5	39.1

L1 Top4

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip
with image mask

IoU topk: We choose the topK of IoU between anchor boxes and labels as the positive samples.

L1 topk: We choose the topK of L1 distance between anchor boxes and labels as the positive samples.

Method	AP	AP50	AP75	APs	APm	APl
IoU Top4	28.4	46.9	29.1	13.5	33.5	39.1
L1 Top4	28.6	46.9	29.4	13.8	34.0	39.0

RandomShift Augmentation

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: L1 Top4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip
with image mask

YOLOF takes advantage of RandomShift augmentation which is not used in RetinaNet.

Method	AP	AP50	AP75	APs	APm	APl
w/o RandomShift	28.6	46.9	29.4	13.8	34.0	39.0
w/ RandomShift	29.0	47.3	29.8	14.2	34.2	38.9

Fix a bug in dataloader

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: L1 Top4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip + RandomShift
with image mask

I fixed a bug in dataloader. Specifically, I set the shuffle in dataloader as False ...

Method	AP	AP50	AP75	APs	APm	APl
bug	29.0	47.3	29.8	14.2	34.2	38.9
no bug	30.1	49.0	31.0	15.2	36.3	39.8

Ignore samples

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: L1 Top4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip + RandomShift
with image mask

We ignore those negative samples whose IoU with labels are higher the ignore threshold (igt).

Method	AP	AP50	AP75	APs	APm	APl
no igt	30.1	49.0	31.0	15.2	36.3	39.8
igt=0.7

Decode boxes

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: L1 Top4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip + RandomShift
with image mask

Method-1: ctr_x = x_anchor + t_x, ctr_y = y_anchor + t_y

Method-2: ctr_x = x_anchor + t_x * w_anchor, ctr_y = y_anchor + t_y * h_anchor

The Method-2 is following the operation used in YOLOF.

Method	AP	AP50	AP75	APs	APm	APl
Method-1
Method-2

Train

sh train.sh

You can change the configurations of train.sh.

If you just want to check which anchor box is assigned to the positive sample, you can run:

python train.py --cuda -d voc --batch_size 8 --vis_targets

According to your own situation, you can make necessary adjustments to the above run commands

Test

python test.py -d [select a dataset: voc or coco] \
               --cuda \
               -v [select a model] \
               --weight [ Please input the path to model dir. ] \
               --img_size 800 \
               --root path/to/dataset/ \
               --show

You can run the above command to visualize the detection results on the dataset.

Comments

fix typo

When I run the eval process on VOC dataset, an error occurs:

Traceback (most recent call last):
  File "eval.py", line 126, in <module>
    voc_test(model, data_dir, device, transform)
  File "eval.py", line 42, in voc_test
    display=True)
TypeError: __init__() got an unexpected keyword argument 'data_root'

I discovered that this was due to a typo and simply fixed it. Everything is going well now.

opened by guohanli 1

标签生成函数写得有问题

源码中的标签生成逻辑是： 1.利用预测框与gt的l1距离筛选出topk个锚点，再利用锚点与gt的l1距离筛选出topk个锚点，将之作为预选正例锚点。 2.将预选正例锚点依据iou与gt匹配，滤除与锚点iou小于0.15的预选正例锚点 3.将gt与预测框iou<=0.7的预测框对应锚点设置为负例锚点 (而您只用了锚点，没有预选，也没用预测框)

opened by Mr-Z-NewStar 11

Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

⚠️ ‎‎‎ A more recent and actively-maintained version of this code is available in ivadomed Stacked Hourglass Network with a Multi-level Attention Mech

14 Oct 24, 2022

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

YOLOR implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks To reproduce the results in the paper, please us

1.8k Jan 4, 2023

You Only 👀 One Sequence

You Only 👀 One Sequence TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO obje

666 Jan 3, 2023

Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 different colors, eraser and a recording option that records your session and saves it in a "recordings" folder. Use index finger to draw and two or more fingers to move around and select items. Future version will contain more functionalities like changeable thickness, color palette, integration with zoom and google meet etc.

hand-write Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 differ

27 Dec 16, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

A PyTorch version of You Only Look at One-level Feature object detector

Related tags

Overview

PyTorch_YOLOF

Requirements

Visualize positive sample

My Ablation Studies

image mask

L1 Top4

RandomShift Augmentation

Fix a bug in dataloader

Ignore samples

Decode boxes

Train

Test

You might also like...

Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

You Only 👀 One Sequence

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

A Pytorch Implementation of [Source data‐free domain adaptation of object detector through domain

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Comments

fix typo

标签生成函数写得有问题

Releases(YOLOF-weight)

YOLOF-weight(Mar 20, 2022)

Owner

Jianhua Yang

The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

Fastshap: A fast, approximate shap kernel

Recursive Bayesian Networks

Additional code for Stable-baselines3 to load and upload models from the Hub.

[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

ScaleNet: A Shallow Architecture for Scale Estimation

Code of our paper "Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning"

Python library for science observations from the James Webb Space Telescope

Let Python optimize the best stop loss and take profits for your TradingView strategy.

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021).

Implementation of paper "DCS-Net: Deep Complex Subtractive Neural Network for Monaural Speech Enhancement"

Adversarial Texture Optimization from RGB-D Scans (CVPR 2020).

Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Neural HMMs are all you need (for high-quality attention-free TTS)

Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.

An Open-Source Toolkit for Prompt-Learning.