Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Last update: Nov 23, 2022

Related tags

Deep Learning ProposeReduce

Overview

Propose-Reduce VIS

This repo contains the official implementation for the paper:

Video Instance Segmentation with a Propose-Reduce Paradigm

Huaijia Lin*, Ruizheng Wu*, Shu Liu, Jiangbo Lu, Jiaya Jia

ICCV 2021 | Paper

Installation

Please refer to INSTALL.md.

Demo

You can compute the VIS results for your own videos.

Download pretrained weight.
Put example videos in 'demo/inputs'. We support two types of inputs, frames directories or .mp4 files (see example for details).
Run the following script and obtain the results in demo/outputs.

sh demo.sh

Data Preparation

(1) Download the videos and jsons of val set from YouTube-VIS 2019

(2) Download the videos and jsons of val set from YouTube-VIS 2021

(3) Symlink the corresponding dataset and json files to the data folder

mkdir data

data
├── valset_ytv19 --> /path/to/ytv2019/vos/valid/JPEGImages/ 
├── valid_ytv19.json --> /path/to/ytv2019/vis/valid.json
├── valset_ytv21 --> /path/to/ytv2021/vis/valid/JPEGImages/ 
├── valid_ytv21.json --> /path/to/ytv2021/vis/valid/instances.json

Results

We provide the results of several pretrained models and corresponding scripts on different backbones. The results have slight differences from the paper because we make minor modifications to the inference codes.

Download the pretrained models and put them in pretrained folder.

mkdir pretrained

Dataset	Method	Backbone	CA Reduce	AP	[email protected]	download
YouTube-VIS 2019	Seq Mask R-CNN	ResNet-50		40.8	49.9	model \| scripts
YouTube-VIS 2019	Seq Mask R-CNN	ResNet-50	✓	42.5	56.8	scripts
YouTube-VIS 2019	Seq Mask R-CNN	ResNet-101		43.8	52.7	model \| scripts
YouTube-VIS 2019	Seq Mask R-CNN	ResNet-101	✓	45.2	59.0	scripts
YouTube-VIS 2019	Seq Mask R-CNN	ResNeXt-101		47.6	56.7	model \| scripts
YouTube-VIS 2019	Seq Mask R-CNN	ResNeXt-101	✓	48.8	62.2	scripts

YouTube-VIS 2021	Seq Mask R-CNN	ResNet-50		39.6	47.5	model \| scripts
YouTube-VIS 2021	Seq Mask R-CNN	ResNet-50	✓	41.7	54.9	scripts
YouTube-VIS 2021	Seq Mask R-CNN	ResNeXt-101		45.6	52.9	model \| scripts
YouTube-VIS 2021	Seq Mask R-CNN	ResNeXt-101	✓	47.2	57.6	scripts

Evaluation

YouTube-VIS 2019: A json file will be saved in `../Results_ytv19' folder. Please zip and upload to the codalab server.

YouTube-VIS 2021: A json file will be saved in `../Results_ytv21' folder. Please zip and upload to the codalab server.

TODOs

Results on YouTube-VIS 2021
Results on DAVIS-UVOS
Category-Aware Sequence Reduction (CA Reduce)
Training Codes

Citation

If you find this work useful in your research, please cite:

@article{lin2021video,
  title={Video Instance Segmentation with a Propose-Reduce Paradigm},
  author={Lin, Huaijia and Wu, Ruizheng and Liu, Shu and Lu, Jiangbo and Jia, Jiaya},
  booktitle={IEEE International Conference on Computer Vision (ICCV)},
  year={2021}
}

Contact

If you have any questions regarding the repo, please feel free to contact me ([email protected]) or create an issue.

Acknowledgments

This repo is based on MMDetection, MaskTrackRCNN, STM, MMCV and COCOAPI.

Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Related tags

Overview

Propose-Reduce VIS

Installation

Demo

Data Preparation

Results

Evaluation

TODOs

Citation

Contact

Acknowledgments

Owner

DV Lab

TensorFlow-based neural network library

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

PRIME: A Few Primitives Can Boost Robustness to Common Corruptions

Face uncertainty quantification or estimation using PyTorch.

Liver segmentation using MONAI and pytorch

🔎 Super-scale your images and run experiments with Residual Dense and Adversarial Networks.

Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks

Framework for estimating the structures and parameters of Bayesian networks (DAGs) at per-sample resolution

FairyTailor: Multimodal Generative Framework for Storytelling

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Optimizing Value-at-Risk and Conditional Value-at-Risk of Black Box Functions with Lacing Values (LV)

Official implementation of the NeurIPS 2021 paper Online Learning Of Neural Computations From Sparse Temporal Feedback

Self Driving RC Car Code

Anomaly detection in multi-agent trajectories: Code for training, evaluation and the OpenAI highway simulation.

AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

A simple implementation of Kalman filter in Multi Object Tracking

Using this you can control your PC/Laptop volume by Hand Gestures (pinch-in, pinch-out) created with Python.

A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.