PromptDet: Expand Your Detector Vocabulary with Uncurated Images

Last update: Dec 20, 2022

Overview

PromptDet: Expand Your Detector Vocabulary with Uncurated Images

Introduction

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations. To achieve that, we make the following four contributions: (i) in pursuit of generalisation, we propose a two-stage open-vocabulary object detector that categorises each box proposal by a classifier generated from the text encoder of a pre-trained visual-language model; (ii) To pair the visual latent space (from RPN box proposal) with that of the pre-trained text encoder, we propose the idea of regional prompt learning to optimise a couple of learnable prompt vectors, converting the textual embedding space to fit those visually object-centric images; (iii) To scale up the learning procedure towards detecting a wider spectrum of objects, we exploit the available online resource, iteratively updating the prompts, and later self-training the proposed detector with pseudo labels generated on a large corpus of noisy, uncurated web images. The self-trained detector, termed as PromptDet, significantly improves the detection performance on categories for which manual annotations are unavailable or hard to obtain, e.g. rare categories. Finally, (iv) to validate the necessity of our proposed components, we conduct extensive experiments on the challenging LVIS and MS-COCO dataset, showing superior performance over existing approaches with fewer additional training images and zero manual annotations whatsoever.

Training framework

Prerequisites

MMDetection version 2.16.0.
Please see get_started.md for installation and the basic usage of MMDetection.

Inference

./tools/dist_test.sh configs/promptdet/promptdet_mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.py work_dirs/promptdet_mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.pth 4 --eval bbox segm

Train

To be updated.

Models

For your convenience, we provide the following trained models (PromptDet) with mask AP.

Model	Epochs	Scale Jitter	Input Size	AP_novel	APc	AP_f	AP	Config	Download
PromptDet_R_50_FPN_1x	12	640~800	800x800	19.0	18.5	25.8	21.4	config	google / baidu
PromptDet_R_50_FPN_6x	72	100~1280	800x800	21.4	23.3	29.3	25.3	config	google / baidu

[0] All results are obtained with a single model and without any test time data augmentation such as multi-scale, flipping and etc..
[1] Refer to more details in config files in config/promptdet/.
[2] Extraction code of baidu netdisk: promptdet.

Acknowledgement

Thanks MMDetection team for the wonderful open source project!

Citation

If you find PromptDet useful in your research, please consider citing:

@inproceedings{feng2022promptdet,
    title={PromptDet: Expand Your Detector Vocabulary with Uncurated Images},
    author={Feng, Chengjian and Zhong, Yujie and Jie, Zequn and Chu, Xiangxiang and Ren, Haibing and Wei, Xiaolin and Xie, Weidi and Ma, Lin},
    journal={arXiv preprint arXiv:2203.16513},
    year={2022}
}

PromptDet: Expand Your Detector Vocabulary with Uncurated Images

Related tags

Overview

PromptDet: Expand Your Detector Vocabulary with Uncurated Images

Introduction

Training framework

Prerequisites

Inference

Train

Models

Acknowledgement

Citation

Owner

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

MT3: Multi-Task Multitrack Music Transcription

OSLO: Open Source framework for Large-scale transformer Optimization

Arquitetura e Desenho de Software.

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

Single-Stage Instance Shadow Detection with Bidirectional Relation Learning (CVPR 2021 Oral)

Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

[TIP2020] Adaptive Graph Representation Learning for Video Person Re-identification

Semantically Contrastive Learning for Low-light Image Enhancement

Research on controller area network Intrusion Detection Systems

Implementation of Heterogeneous Graph Attention Network

On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

Chainer implementation of recent GAN variants

K-FACE Analysis Project on Pytorch

Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

Code for weakly supervised segmentation of a single class