BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

Overview

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

By Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, Xiangyang Xue.

This repo is the official implementation of BigDetection. It is based on mmdetection and CBNetV2.

Introduction

We construct a new large-scale benchmark termed BigDetection. Our goal is to simply leverage the training data from existing datasets (LVIS, OpenImages and Object365) with carefully designed principles, and curate a larger dataset for improved detector pre-training. BigDetection dataset has 600 object categories and contains 3.4M training images with 36M object bounding boxes. We show some important statistics of BigDetection in the following figure.

Left: Number of images per category of BigDetection. Right: Number of instances in different object sizes.

Results and Models

BigDetection Validation

We show the evaluation results on BigDetection Validation. We hope BigDetection could serve as a new challenging benchmark for evaluating next-level object detection methods.

Method mAP (bigdet val) Links
YOLOv3 9.7 model/config
Deformable DETR 13.1 model/config
Faster R-CNN (C4)* 18.9 model
Faster R-CNN (FPN)* 19.4 model
CenterNet2* 23.1 model
Cascade R-CNN* 24.1 model
CBNetV2-Swin-Base 35.1 model/config

COCO Validation

We show the finetuning performance on COCO minival/test-dev. Results show that BigDetection pre-training provides significant benefits for different detector architectures. We achieve 59.8 mAP on COCO test-dev with a single model.

Method mAP (coco minival/test-dev) Links
YOLOv3 30.5/- config
Deformable DETR 39.9/- model/config
Faster R-CNN (C4)* 38.8/- model
Faster R-CNN (FPN)* 40.5/- model
CenterNet2* 45.3/- model
Cascade R-CNN* 45.1/- model
CBNetV2-Swin-Base 59.1/59.5 model/config
CBNetV2-Swin-Base (TTA) 59.5/59.8 config

Data Efficiency

We followed STAC and SoftTeacher to evaluate on COCO for different partial annotation settings.

Method mAP (1%) mAP (2%) mAP (5%) mAP (10%)
Baseline 9.8 14.3 21.2 26.2
STAC 14.0 18.3 24.4 28.6
SoftTeacher (ICCV 21) 20.5 26.5 30.7 34.0
Ours 25.3 28.1 31.9 34.1
model model model model

Notes

  • The models following * are implemented on another detection codebase Detectron2. Here we provide the pretrained checkpoints. The results can be reproduced following the installation of CenterNet2 codebase.
  • Most of models are trained for 8X schedule on BigDetection.
  • Most of pretrained models are finetuned for 1X schedule on COCO.
  • TTA denotes test time augmentation.
  • Pre-trained models of Swin Transformer can be downloaded from Swin Transformer for ImageNet Classification.

Getting Started

Requirements

  • Ubuntu 16.04
  • CUDA 10.2

Installation

# Create conda environment
conda create -n bigdet python=3.7 -y
conda activate bigdet

# Install Pytorch
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=10.2 -c pytorch

# Install mmcv
pip install mmcv-full==1.3.9 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html

# Clone and install
git clone https://github.com/amazon-research/bigdetection.git
cd bigdetection
pip install -r requirements/build.txt
pip install -v -e .

# Install Apex (optinal)
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Data Preparation

Our BigDetection involves 3 datasets and train/val data can be downloaded from their official website (Objects365, OpenImages v6, LVIS v1.0). All datasets should be placed under $bigdetection/data/ as below. The synsets (total 600 class names) of BigDetection dataset can be downloaded here: bigdetection_synsets. Contact us with [email protected] to get access to our pre-processed annotation files.

bigdetection/data
└── BigDetection
    ├── annotations
    │   ├── bigdet_obj_train.json
    │   ├── bigdet_oid_train.json
    │   ├── bigdet_lvis_train.json
    │   ├── bigdet_val.json
    │   └── cas_weights.json
    ├── train
    │   ├── Objects365
    │   ├── OpenImages
    │   └── LVIS
    └── val

Training

To train a detector with pre-trained models, run:

# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options load_from=<PRETRAIN_MODEL>

Pre-training

To pre-train a CBNetV2 with a Swin-Base backbone on BigDetection using 8 GPUs, run: (PRETRAIN_MODEL should be pre-trained checkpoint of Base-Swin-Transformer: model)

tools/dist_train.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_bigdet.py 8 \
    --cfg-options load_from=<PRETRAIN_MODEL>

To pre-train a Deformable-DETR with a ResNet-50 backbone on BigDetection, run:

tools/dist_train.sh configs/BigDetection/deformable_detr/deformable_detr_r50_16x2_8x_bigdet.py 8

Fine-tuning

To fine-tune a BigDetection pre-trained CBNetV2 (with Swin-Base backbone) on COCO, run: (PRETRAIN_MODEL should be BigDetection pre-trained checkpoint of CBNetV2: model)

tools/dist_train.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_20e_coco.py 8 \
    --cfg-options load_from=<PRETRAIN_MODEL>

Inference

To evaluate a detector with pre-trained checkpoints, run:

tools/dist_test.sh <CONFIG_FILE> <CHECKPOINT> <GPU_NUM> --eval bbox

BigDetection evaluation

To evaluate pre-trained CBNetV2 on BigDetection validation, run:

tools/dist_test.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_bigdet.py \
    <BIGDET_PRETRAIN_CHECKPOINT> 8 --eval bbox

COCO evaluation

To evaluate COCO-finetuned CBNetV2 on COCO validation, run:

# without test-time-augmentation
tools/dist_test.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_20e_coco.py \
    <COCO_FINETUNE_CHECKPOINT> 8 --eval bbox mask

# with test-time-augmentation
tools/dist_test.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_20e_coco_tta.py \
    <COCO_FINETUNE_CHECKPOINT> 8 --eval bbox mask

Other configuration based on Detectron2 can be found at detectron2-probject.

Citation

If you use our dataset or pretrained models in your research, please kindly consider to cite the following paper.

@article{bigdetection2022,
  title={BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training},
  author={Likun Cai and Zhi Zhang and Yi Zhu and Li Zhang and Mu Li and Xiangyang Xue},
  journal={arXiv preprint arXiv:2203.13249},
  year={2022}
}

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Acknowledgement

We thank the authors releasing mmdetection and CBNetv2 for object detection research community.

Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Hierarchical Pretraining: Research Repository This is a research repository for reproducing the results from the project "Self-supervised pretraining

Colorado Reed 53 Nov 09, 2022
Implementation of Kronecker Attention in Pytorch

Kronecker Attention Pytorch Implementation of Kronecker Attention in Pytorch. Results look less than stellar, but if someone found some context where

Phil Wang 16 May 06, 2022
PyTorchVideo is a deeplearning library with a focus on video understanding work

PyTorchVideo is a deeplearning library with a focus on video understanding work. PytorchVideo provides resusable, modular and efficient components needed to accelerate the video understanding researc

Facebook Research 2.7k Jan 07, 2023
Introduction to CPM

CPM CPM is an open-source program on large-scale pre-trained models, which is conducted by Beijing Academy of Artificial Intelligence and Tsinghua Uni

Tsinghua AI 136 Dec 23, 2022
AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models

AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models Description

Angel de Paula 0 Jun 08, 2022
Image Super-Resolution Using Very Deep Residual Channel Attention Networks

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

kongdebug 14 Oct 14, 2022
This Artificial Intelligence program can take a black and white/grayscale image and generate a realistic or plausible colorized version of the same picture.

Colorizer The point of this project is to write a program capable of taking a black and white / grayscale image, and generating a realistic or plausib

Maitri Shah 1 Jan 06, 2022
DeepSpamReview: Detection of Fake Reviews on Online Review Platforms using Deep Learning Architectures. Summer Internship project at CoreView Systems.

Detection of Fake Reviews on Online Review Platforms using Deep Learning Architectures Dataset: https://s3.amazonaws.com/fast-ai-nlp/yelp_review_polar

Ashish Salunkhe 37 Dec 17, 2022
Instant Real-Time Example-Based Style Transfer to Facial Videos

FaceBlit: Instant Real-Time Example-Based Style Transfer to Facial Videos The official implementation of FaceBlit: Instant Real-Time Example-Based Sty

Aneta Texler 131 Dec 19, 2022
In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results from as little as 16 seconds of target data.

Neural Instrument Cloning In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results fro

Erland 127 Dec 23, 2022
MaskTrackRCNN for video instance segmentation based on mmdetection

MaskTrackRCNN for video instance segmentation Introduction This repo serves as the official code release of the MaskTrackRCNN model for video instance

411 Jan 05, 2023
Solving SMPL/MANO parameters from keypoint coordinates.

Minimal-IK A simple and naive inverse kinematics solver for MANO hand model, SMPL body model, and SMPL-H body+hand model. Briefly, given joint coordin

Yuxiao Zhou 305 Dec 30, 2022
Efficient face emotion recognition in photos and videos

This repository contains code of face emotion recognition that was developed in the RSF (Russian Science Foundation) project no. 20-71-10010 (Efficien

Andrey Savchenko 239 Jan 04, 2023
Deep learned, hardware-accelerated 3D object pose estimation

Isaac ROS Pose Estimation Overview This repository provides NVIDIA GPU-accelerated packages for 3D object pose estimation. Using a deep learned pose e

NVIDIA Isaac ROS 41 Dec 18, 2022
Point detection through multi-instance deep heatmap regression for sutures in endoscopy

Suture detection PyTorch This repo contains the reference implementation of suture detection model in PyTorch for the paper Point detection through mu

artificial intelligence in the area of cardiovascular healthcare 3 Jul 16, 2022
🎁 3,000,000+ Unsplash images made available for research and machine learning

The Unsplash Dataset The Unsplash Dataset is made up of over 250,000+ contributing global photographers and data sourced from hundreds of millions of

Unsplash 2k Jan 03, 2023
This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.

TransMix: Attend to Mix for Vision Transformers This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transf

Jie-Neng Chen 130 Jan 01, 2023
Pointer-generator - Code for the ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks

Note: this code is no longer actively maintained. However, feel free to use the Issues section to discuss the code with other users. Some users have u

Abi See 2.1k Jan 04, 2023
PyTorch implementation of Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Simple PyTorch Implementation of "Grokking" Implementation of Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets Usage Running

Teddy Koker 15 Sep 29, 2022
This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

slue-toolkit We introduce Spoken Language Understanding Evaluation (SLUE) benchmark. This toolkit provides codes to download and pre-process the SLUE

ASAPP Research 39 Sep 21, 2022