Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

Overview

Receptive Field Block Net for Accurate and Fast Object Detection

By Songtao Liu, Di Huang, Yunhong Wang

Updatas (2021/07/23): YOLOX is here!, stronger YOLO with ONNX, TensorRT, ncnn, and OpenVino supported!!

Updates: we propose a new method to get 42.4 mAP at 45 FPS on COCO, code is available here

Introduction

Inspired by the structure of Receptive Fields (RFs) in human visual systems, we propose a novel RF Block (RFB) module, which takes the relationship between the size and eccentricity of RFs into account, to enhance the discriminability and robustness of features. We further assemble the RFB module to the top of SSD with a lightweight CNN model, constructing the RFB Net detector. You can use the code to train/evaluate the RFB Net for object detection. For more details, please refer to our ECCV paper.

   

VOC2007 Test

System mAP FPS (Titan X Maxwell)
Faster R-CNN (VGG16) 73.2 7
YOLOv2 (Darknet-19) 78.6 40
R-FCN (ResNet-101) 80.5 9
SSD300* (VGG16) 77.2 46
SSD512* (VGG16) 79.8 19
RFBNet300 (VGG16) 80.7 83
RFBNet512 (VGG16) 82.2 38

COCO

System test-dev mAP Time (Titan X Maxwell)
Faster R-CNN++ (ResNet-101) 34.9 3.36s
YOLOv2 (Darknet-19) 21.6 25ms
SSD300* (VGG16) 25.1 22ms
SSD512* (VGG16) 28.8 53ms
RetinaNet500 (ResNet-101-FPN) 34.4 90ms
RFBNet300 (VGG16) 30.3 15ms
RFBNet512 (VGG16) 33.8 30ms
RFBNet512-E (VGG16) 34.4 33ms

MobileNet

System COCO minival mAP #parameters
SSD MobileNet 19.3 6.8M
RFB MobileNet 20.7 7.4M

Citing RFB Net

Please cite our paper in your publications if it helps your research:

@InProceedings{Liu_2018_ECCV,
author = {Liu, Songtao and Huang, Di and Wang, andYunhong},
title = {Receptive Field Block Net for Accurate and Fast Object Detection},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}

Contents

  1. Installation
  2. Datasets
  3. Training
  4. Evaluation
  5. Models

Installation

  • Install PyTorch-0.4.0 by selecting your environment on the website and running the appropriate command.
  • Clone this repository. This repository is mainly based on ssd.pytorch and Chainer-ssd, a huge thank to them.
    • Note: We currently only support PyTorch-0.4.0 and Python 3+.
  • Compile the nms and coco tools:
./make.sh

Note: Check you GPU architecture support in utils/build.py, line 131. Default is:

'nvcc': ['-arch=sm_52',
  • Then download the dataset by following the instructions below and install opencv.
conda install opencv

Note: For training, we currently support VOC and COCO.

Datasets

To make things easy, we provide simple VOC and COCO dataset loader that inherits torch.utils.data.Dataset making it fully compatible with the torchvision.datasets API.

VOC Dataset

Download VOC2007 trainval & test
# specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2007.sh # <directory>
Download VOC2012 trainval
# specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2012.sh # <directory>

COCO Dataset

Install the MS COCO dataset at /path/to/coco from official website, default is ~/data/COCO. Following the instructions to prepare minival2014 and valminusminival2014 annotations. All label files (.json) should be under the COCO/annotations/ folder. It should have this basic structure

$COCO/
$COCO/cache/
$COCO/annotations/
$COCO/images/
$COCO/images/test2015/
$COCO/images/train2014/
$COCO/images/val2014/

UPDATE: The current COCO dataset has released new train2017 and val2017 sets which are just new splits of the same image sets.

Training

mkdir weights
cd weights
wget https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth
  • To train RFBNet using the train script simply specify the parameters listed in train_RFB.py as a flag or manually change them.
python train_RFB.py -d VOC -v RFB_vgg -s 300 
  • Note:
    • -d: choose datasets, VOC or COCO.
    • -v: choose backbone version, RFB_VGG, RFB_E_VGG or RFB_mobile.
    • -s: image size, 300 or 512.
    • You can pick-up training from a checkpoint by specifying the path as one of the training parameters (again, see train_RFB.py for options)
    • If you want to reproduce the results in the paper, the VOC model should be trained about 240 epoches while the COCO version need 130 epoches.

Evaluation

To evaluate a trained network:

python test_RFB.py -d VOC -v RFB_vgg -s 300 --trained_model /path/to/model/weights

By default, it will directly output the mAP results on VOC2007 test or COCO minival2014. For VOC2012 test and COCO test-dev results, you can manually change the datasets in the test_RFB.py file, then save the detection results and submitted to the server.

Models

Owner
Liu Songtao
我萧峰大好男儿~ Factos👍👀​
Liu Songtao
Evaluating Cross-lingual Sentence Representations

XNLI: The Cross-Lingual NLI Corpus XNLI is an evaluation corpus for language transfer and cross-lingual sentence classification in 15 languages. New:

Meta Research 395 Dec 19, 2022
A deep learning framework for historical document image analysis

DIVA-DAF Description A deep learning framework for historical document image analysis. How to run Install dependencies # clone project git clone https

9 Aug 04, 2022
Some simple programs built in Python: webcam with cv2 that detects eyes and face, with grayscale filter

Programas en Python Algunos programas simples creados en Python: 📹 Webcam con c

Madirex 1 Feb 15, 2022
Towards uncontrained hand-object reconstruction from RGB videos

Towards uncontrained hand-object reconstruction from RGB videos Yana Hasson, Gül Varol, Ivan Laptev and Cordelia Schmid Project page Paper Table of Co

Yana 69 Dec 27, 2022
PyTorch implementations of the paper: "DR.VIC: Decomposition and Reasoning for Video Individual Counting, CVPR, 2022"

DRNet for Video Indvidual Counting (CVPR 2022) Introduction This is the official PyTorch implementation of paper: DR.VIC: Decomposition and Reasoning

tao han 35 Nov 22, 2022
Training data extraction on GPT-2

Training data extraction from GPT-2 This repository contains code for extracting training data from GPT-2, following the approach outlined in the foll

Florian Tramer 62 Dec 07, 2022
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Advanced Image Manipulation Lab @ Samsung AI Center Moscow 4.7k Dec 31, 2022
Offcial implementation of "A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction, ICCV-2021".

HF2-VAD Offcial implementation of "A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Predictio

76 Dec 21, 2022
Edge Restoration Quality Assessment

ERQA - Edge Restoration Quality Assessment ERQA - a full-reference quality metric designed to analyze how good image and video restoration methods (SR

MSU Video Group 27 Dec 17, 2022
My implementation of transformers related papers for computer vision in pytorch

vision_transformers This is my personnal repo to implement new transofrmers based and other computer vision DL models I am currenlty working without a

samsja 1 Nov 10, 2021
BRepNet: A topological message passing system for solid models

BRepNet: A topological message passing system for solid models This repository contains the an implementation of BRepNet: A topological message passin

Autodesk AI Lab 42 Dec 30, 2022
Python with OpenCV - MediaPip Framework Hand Detection

Python HandDetection Python with OpenCV - MediaPip Framework Hand Detection Explore the docs » Contact Me About The Project It is a Computer vision pa

2 Jan 07, 2022
Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Open-Set Recognition: A Good Closed-Set Classifier is All You Need Code for our paper: "Open-Set Recognition: A Good Closed-Set Classifier is All You

194 Jan 03, 2023
Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models

Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models Abstract Many applications of generative models rely on the marginali

Stanford Intelligent Systems Laboratory 9 Jun 06, 2022
Code for our EMNLP 2021 paper "Learning Kernel-Smoothed Machine Translation with Retrieved Examples"

KSTER Code for our EMNLP 2021 paper "Learning Kernel-Smoothed Machine Translation with Retrieved Examples" [paper]. Usage Download the processed datas

jiangqn 23 Nov 24, 2022
A generalist algorithm for cell and nucleus segmentation.

Cellpose | A generalist algorithm for cell and nucleus segmentation. Cellpose was written by Carsen Stringer and Marius Pachitariu. To learn about Cel

MouseLand 733 Dec 29, 2022
Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions

Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions Accepted by AAAI 2022 [arxiv] Wenyu Liu, Gaofeng Ren, Runsheng Yu, Shi Guo, Jia

liuwenyu 245 Dec 16, 2022
OpenPCDet Toolbox for LiDAR-based 3D Object Detection.

OpenPCDet OpenPCDet is a clear, simple, self-contained open source project for LiDAR-based 3D object detection. It is also the official code release o

OpenMMLab 3.2k Dec 31, 2022
Official Pytorch implementation for video neural representation (NeRV)

NeRV: Neural Representations for Videos (NeurIPS 2021) Project Page | Paper | UVG Data Hao Chen, Bo He, Hanyu Wang, Yixuan Ren, Ser-Nam Lim, Abhinav S

hao 214 Dec 28, 2022
BABEL: Bodies, Action and Behavior with English Labels [CVPR 2021]

BABEL is a large dataset with language labels describing the actions being performed in mocap sequences. BABEL labels about 43 hours of mocap sequences from AMASS [1] with action labels.

113 Dec 28, 2022