Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation.

Last update: Oct 12, 2022

Overview

MosaicOS

Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation.

Introduction

Many objects do not appear frequently enough in complex scenes (e.g., certain handbags in living rooms) for training an accurate object detector, but are often found frequently by themselves (e.g., in product images). Yet, these object-centric images are not effectively leveraged for improving object detection in scene-centric images.

We propose Mosaic of Object-centric images as Scene-centric images (MosaicOS), a simple and novel framework that is surprisingly effective at tackling the challenges of long-tailed object detection. Keys to our approach are three-fold: (i) pseudo scene-centric image construction from object-centric images for mitigating domain differences, (ii) high-quality bounding box imputation using the object-centric images’ class labels, and (iii) a multistage training procedure. Check our paper for further details:

MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

by Cheng Zhang*, Tai-Yu Pan*, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao.

Mosaics

The script mosaic.py generates mosaic images and annotaions by given an annotation file in COCO format (for more information here). The following command will generate 2x2 mosaic images and the annotation file for COCO training dataset in OUTPUT_DIR/images/ and OUTPUT_DIR/annotation.json with 4 processors. --shuffle is to shuffle the order of images to synthesize and --drop-last is to drop the last couple of images if they are not enough for nrow * ncol. --demo 10 plots 10 synthesized images with annotated boxes in OUTPUT_DIR/demo/ for visualization.

 python mosaic.py --coco-file datasets/coco/annotations/instances_train2017.json --img-dir datasets/coco --output-dir output_mosaics --num-proc 4 --nrow 2 --ncol 2 --shuffle --drop-last --demo 10

*Note: In our work, we sythesize mosaics from object-centric images with pseudo bounding box to find-tune the pre-trained detector.

Pre-trained models

Our impelementation is based on Detectron2. All models are trained on LVIS training set with Repeated Factor Sampling (RFS).

LVIS v0.5 validation set

Object detection

Backbone	Method	APb	APbr	APbc	APbf	Download
R50-FPN	Faster R-CNN	23.4	13.0	22.6	28.4	model
R50-FPN	MosaicOS	25.0	20.2	23.9	28.3	model

Instance segmentation

Backbone	Method	AP	APr	APc	APf	APb	Download
R50-FPN	Mask R-CNN	24.4	16.0	24.0	28.3	23.6	model
R50-FPN	MosaicOS	26.3	19.7	26.6	28.5	25.8	model

LVIS v1.0 validation set

Object detection

Backbone	Method	APb	APbr	APbc	APbf	Download
R50-FPN	Faster R-CNN	22.0	10.6	20.1	29.2	model
R50-FPN	MosaicOS	23.9	15.5	22.4	29.3	model

Instance segmentation

Backbone	Method	AP	APr	APc	APf	APb	Download
R50-FPN	Mask R-CNN	22.6	12.3	21.3	28.6	23.3	model
R50-FPN	MosaicOS	24.5	18.2	23.0	28.8	25.1	model
R101-FPN	Mask R-CNN	24.8	15.2	23.7	30.3	25.5	model
R101-FPN	MosaicOS	26.7	20.5	25.8	30.5	27.4	model
X101-FPN	Mask R-CNN	26.7	17.6	25.6	31.9	27.4	model
X101-FPN	MosaicOS	28.3	21.8	27.2	32.4	28.9	model

Citation

Please cite with the following bibtex if you find it useful.

@inproceedings{zhang2021mosaicos,
  title={{MosaicOS}: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection},
  author={Zhang, Cheng and Pan, Tai-Yu and Li, Yandong and Hu, Hexiang and Xuan, Dong and Changpinyo, Soravit and Gong, Boqing and Chao, Wei-Lun},
  booktitle = {ICCV},
  year={2021}
}

Questions

Feel free to email us if you have any questions.

Cheng Zhang ([email protected]), Tai-Yu Pan ([email protected]), Wei-Lun Harry Chao ([email protected])

Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation.

Related tags

Overview

MosaicOS

Introduction

Mosaics

Pre-trained models

LVIS v0.5 validation set

LVIS v1.0 validation set

Citation

Questions

Owner

Cheng Zhang

Graph-total-spanning-trees - A Python script to get total number of Spanning Trees in a Graph

A simple interface for editing natural photos with generative neural networks.

An Image compression simulator that uses Source Extractor and Monte Carlo methods to examine the post compressive effects different compression algorithms have.

"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

A Framework for Encrypted Machine Learning in TensorFlow

Drone detection using YOLOv5

This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.

Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop

A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Recognize Handwritten Digits using Deep Learning on the browser itself.

pytorch implementation of GPV-Pose

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Computational inteligence project on faces in the wild dataset

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

A Novel Incremental Learning Driven Instance Segmentation Framework to Recognize Highly Cluttered Instances of the Contraband Items

A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL

A large-scale face dataset for face parsing, recognition, generation and editing.

Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.

Lightweight plotting to the terminal. 4x resolution via Unicode.

Unofficial PyTorch Implementation for HifiFace (https://arxiv.org/abs/2106.09965)