End-to-End Object Detection with Fully Convolutional Network

Overview

End-to-End Object Detection with Fully Convolutional Network

GitHub

This project provides an implementation for "End-to-End Object Detection with Fully Convolutional Network" on PyTorch.

Experiments in the paper were conducted on the internal framework, thus we reimplement them on cvpods and report details as below.

Requirements

Get Started

  • install cvpods locally (requires cuda to compile)
python3 -m pip install 'git+https://github.com/Megvii-BaseDetection/cvpods.git'
# (add --user if you don't have permission)

# Or, to install it from a local clone:
git clone https://github.com/Megvii-BaseDetection/cvpods.git
python3 -m pip install -e cvpods

# Or,
pip install -r requirements.txt
python3 setup.py build develop
  • prepare datasets
cd /path/to/cvpods
cd datasets
ln -s /path/to/your/coco/dataset coco
  • Train & Test
git clone https://github.com/Megvii-BaseDetection/DeFCN.git
cd DeFCN/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms  # for example

# Train
pods_train --num-gpus 8

# Test
pods_test --num-gpus 8 \
    MODEL.WEIGHTS /path/to/your/save_dir/ckpt.pth # optional
    OUTPUT_DIR /path/to/your/save_dir # optional

# Multi node training
## sudo apt install net-tools ifconfig
pods_train --num-gpus 8 --num-machines N --machine-rank 0/1/.../N-1 --dist-url "tcp://MASTER_IP:port"

Results on COCO2017 val set

model assignment with NMS lr sched. mAP mAR download
FCOS one-to-many Yes 3x + ms 41.4 59.1 weight | log
FCOS baseline one-to-many Yes 3x + ms 40.9 58.4 weight | log
Anchor one-to-one No 3x + ms 37.1 60.5 weight | log
Center one-to-one No 3x + ms 35.2 61.0 weight | log
Foreground Loss one-to-one No 3x + ms 38.7 62.2 weight | log
POTO one-to-one No 3x + ms 39.2 61.7 weight | log
POTO + 3DMF one-to-one No 3x + ms 40.6 61.6 weight | log
POTO + 3DMF + Aux mixture* No 3x + ms 41.4 61.5 weight | log

* We adopt a one-to-one assignment in POTO and a one-to-many assignment in the auxiliary loss, respectively.

  • 2x + ms schedule is adopted in the paper, but we adopt 3x + ms schedule here to achieve higher performance.
  • It's normal to observe ~0.3AP noise in POTO.

Results on CrowdHuman val set

model assignment with NMS lr sched. AP50 mMR recall download
FCOS one-to-many Yes 30k iters 86.1 54.9 94.2 weight | log
ATSS one-to-many Yes 30k iters 87.2 49.7 94.0 weight | log
POTO one-to-one No 30k iters 88.5 52.2 96.3 weight | log
POTO + 3DMF one-to-one No 30k iters 88.8 51.0 96.6 weight | log
POTO + 3DMF + Aux mixture* No 30k iters 89.1 48.9 96.5 weight | log

* We adopt a one-to-one assignment in POTO and a one-to-many assignment in the auxiliary loss, respectively.

  • It's normal to observe ~0.3AP noise in POTO, and ~1.0mMR noise in all methods.

Ablations on COCO2017 val set

model assignment with NMS lr sched. mAP mAR note
POTO one-to-one No 6x + ms 40.0 61.9
POTO one-to-one No 9x + ms 40.2 62.3
POTO one-to-one No 3x + ms 39.2 61.1 replace Hungarian algorithm by argmax
POTO + 3DMF one-to-one No 3x + ms 40.9 62.0 remove GN in 3DMF
POTO + 3DMF + Aux mixture* No 3x + ms 41.5 61.5 remove GN in 3DMF

* We adopt a one-to-one assignment in POTO and a one-to-many assignment in the auxiliary loss, respectively.

  • For one-to-one assignment, more training iters lead to higher performance.
  • The argmax (also known as top-1) operation is indeed the approximate solution of bipartite matching in dense prediction methods.
  • It seems harmless to remove GN in 3DMF, which also leads to higher inference speed.

Acknowledgement

This repo is developed based on cvpods. Please check cvpods for more details and features.

License

This repo is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Citing

If you use this work in your research or wish to refer to the baseline results published here, please use the following BibTeX entries:

@article{wang2020end,
  title   =  {End-to-End Object Detection with Fully Convolutional Network},
  author  =  {Wang, Jianfeng and Song, Lin and Li, Zeming and Sun, Hongbin and Sun, Jian and Zheng, Nanning},
  journal =  {arXiv preprint arXiv:2012.03544},
  year    =  {2020}
}

Contributing to the project

Any pull requests or issues about the implementation are welcome. If you have any issue about the library (e.g. installation, environments), please refer to cvpods.

Owner
BaseDetection Team of Megvii
Evaluating different engineering tricks that make RL work

Reinforcement Learning Tricks, Index This repository contains the code for the paper "Distilling Reinforcement Learning Tricks for Video Games". Short

Anssi 15 Dec 26, 2022
Code for CVPR2019 Towards Natural and Accurate Future Motion Prediction of Humans and Animals

Motion prediction with Hierarchical Motion Recurrent Network Introduction This work concerns motion prediction of articulate objects such as human, fi

Shuang Wu 85 Dec 11, 2022
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Xuan Hieu Duong 7 Jan 12, 2022
Pytorch code for our paper Beyond ImageNet Attack: Towards Crafting Adversarial Examples for Black-box Domains)

Beyond ImageNet Attack: Towards Crafting Adversarial Examples for Black-box Domains (ICLR'2022) This is the Pytorch code for our paper Beyond ImageNet

Alibaba-AAIG 37 Nov 23, 2022
High-fidelity 3D Model Compression based on Key Spheres

High-fidelity 3D Model Compression based on Key Spheres This repository contains the implementation of the paper: Yuanzhan Li, Yuqi Liu, Yujie Lu, Siy

5 Oct 11, 2022
Show-attend-and-tell - TensorFlow Implementation of "Show, Attend and Tell"

Show, Attend and Tell Update (December 2, 2016) TensorFlow implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attent

Yunjey Choi 902 Nov 29, 2022
Little tool in python to watch anime from the terminal (the better way to watch anime)

ani-cli Script working again :), thanks to the fork by Dink4n for the alternative approach to by pass the captcha on gogoanime A cli to browse and wat

Harshith 4.5k Dec 31, 2022
Fully convolutional deep neural network to remove transparent overlays from images

Fully convolutional deep neural network to remove transparent overlays from images

Marc Belmont 1.1k Jan 06, 2023
Repo for the paper Extrapolating from a Single Image to a Thousand Classes using Distillation

Extrapolating from a Single Image to a Thousand Classes using Distillation by Yuki M. Asano* and Aaqib Saeed* (*Equal Contribution) Extrapolating from

Yuki M. Asano 16 Nov 04, 2022
Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).

Zero-shot Natural Language Video Localization (ZSNLVL) by Pseudo-Supervised Video Localization (PSVL) This repository is for Zero-shot Natural Languag

Computer Vision Lab. @ GIST 37 Dec 27, 2022
Customizable RecSys Simulator for OpenAI Gym

gym-recsys: Customizable RecSys Simulator for OpenAI Gym Installation | How to use | Examples | Citation This package describes an OpenAI Gym interfac

Xingdong Zuo 14 Dec 08, 2022
Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Gyeongjae Choi 17 Sep 23, 2021
MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios

MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios This is the official TensorFlow implementation of MetaTTE in the

morningstarwang 4 Dec 14, 2022
Code for the paper 'A High Performance CRF Model for Clothes Parsing'.

Clothes Parsing Overview This code provides an implementation of the research paper: A High Performance CRF Model for Clothes Parsing Edgar Simo-S

Edgar Simo-Serra 119 Nov 21, 2022
The original weights of some Caffe models, ported to PyTorch.

pytorch-caffe-models This repo contains the original weights of some Caffe models, ported to PyTorch. Currently there are: GoogLeNet (Going Deeper wit

Katherine Crowson 9 Nov 04, 2022
GLM (General Language Model)

GLM GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language underst

THUDM 421 Jan 04, 2023
Weakly Supervised Learning of Rigid 3D Scene Flow

Weakly Supervised Learning of Rigid 3D Scene Flow This repository provides code and data to train and evaluate a weakly supervised method for rigid 3D

Zan Gojcic 124 Dec 27, 2022
An SE(3)-invariant autoencoder for generating the periodic structure of materials

Crystal Diffusion Variational AutoEncoder This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic st

Tian Xie 94 Dec 10, 2022
Official PyTorch implementation of the paper "Self-Supervised Relational Reasoning for Representation Learning", NeurIPS 2020 Spotlight.

Official PyTorch implementation of the paper: "Self-Supervised Relational Reasoning for Representation Learning" (2020), Patacchiola, M., and Storkey,

Massimiliano Patacchiola 135 Jan 03, 2023
PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

halo 368 Dec 06, 2022