MaskTrackRCNN for video instance segmentation based on mmdetection

Overview

MaskTrackRCNN for video instance segmentation

Introduction

This repo serves as the official code release of the MaskTrackRCNN model for video instance segmentation described in the tech report:

@article{ Yang2019vis,
  author = {Linjie Yang and Yuchen Fan and Ning Xu},  
  title = {Video instance segmentation},
  journal = {CoRR},
  volume = {abs/1905.04804},
  year = {2019},
  url = {https://arxiv.org/abs/1905.04804}
}

In this work, a new task video instance segmentation is presented. Video instance segmentation extends the image instance segmentation task from the image domain to the video domain. The new problem aims at simultaneous detection, segmentation and tracking of object instances in videos. YouTubeVIS, a new dataset tailored for this task is collected based on the current largest video object segmentation dataset YouTubeVOS. Sample annotations of a video clip can be seen below. We also proposed an algorithm to jointly detect, segment, and track object instances in a video, named MaskTrackRCNN. A tracking head is added to the original MaskRCNN model to match objects across frames. An overview of the algorithm is shown below.

Installation

This repo is built based on mmdetection commit hash f3a939f. Please refer to INSTALL.md to install the library. You also need to install a customized COCO API for YouTubeVIS dataset. You can use following commands to create conda env with all dependencies.

conda create -n MaskTrackRCNN -y
conda activate MaskTrackRCNN
conda install -c pytorch pytorch=0.4.1 torchvision cuda92 -y
conda install -c conda-forge cudatoolkit-dev=9.2 opencv -y
conda install cython -y
pip install git+https://github.com/youtubevos/cocoapi.git#"egg=pycocotools&subdirectory=PythonAPI"
bash compile.sh
pip install .

You may also need to follow #1 to load MSCOCO pretrained models.

Model training and evaluation

Our model is based on MaskRCNN-resnet50-FPN. The model is trained end-to-end on YouTubeVIS based on a MSCOCO pretrained checkpoint (link).

Training

  1. Download YouTubeVIS from here.
  2. Symlink the train/validation dataset to $MMDETECTION/data folder. Put COCO-style annotations under $MMDETECTION/data/annotations.
mmdetection
├── mmdet
├── tools
├── configs
├── data
│   ├── train
│   ├── val
│   ├── annotations
│   │   ├── instances_train_sub.json
│   │   ├── instances_val_sub.json
  1. Run python3 tools/train.py configs/masktrack_rcnn_r50_fpn_1x_youtubevos.py to train the model. For reference to arguments such as learning rate and model parameters, please refer to configs/masktrack_rcnn_r50_fpn_1x_youtubevos.py

Evaluation

Our pretrained model is available for download at Google Drive. Run the following command to evaluate the model on YouTubeVIS.

python3 tools/test_video.py configs/masktrack_rcnn_r50_fpn_1x_youtubevos.py [MODEL_PATH] --out [OUTPUT_PATH] --eval segm

A json file containing the predicted result will be generated as OUTPUT_PATH.json. YouTubeVIS currently only allows evaluation on the codalab server. Please upload the generated result to codalab server to see actual performances.

License

This project is released under the Apache 2.0 license.

Contact

If you have any questions regarding the repo, please contact Linjie Yang ([email protected]) or create an issue.

A PyTorch implementation of unsupervised SimCSE

A PyTorch implementation of unsupervised SimCSE

99 Dec 23, 2022
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

SHI Lab 174 Dec 19, 2022
SysWhispers Shellcode Loader

Shhhloader Shhhloader is a SysWhispers Shellcode Loader that is currently a Work in Progress. It takes raw shellcode as input and compiles a C++ stub

icyguider 630 Jan 03, 2023
The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

SF-Net for fullband SE This is the repo of the manuscript "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Ban

Guochen Yu 36 Dec 02, 2022
EigenGAN Tensorflow, EigenGAN: Layer-Wise Eigen-Learning for GANs

Gender Bangs Body Side Pose (Yaw) Lighting Smile Face Shape Lipstick Color Painting Style Pose (Yaw) Pose (Pitch) Zoom & Rotate Flush & Eye Color Mout

Zhenliang He 321 Dec 01, 2022
Implementation of CVPR'21: RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction

RfD-Net [Project Page] [Paper] [Video] RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction Yinyu Nie, Ji Hou, Xiaoguang Han, Matthi

Yinyu Nie 162 Jan 06, 2023
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

PatrickStar: Parallel Training of Large Language Models via a Chunk-based Memory Management Meeting PatrickStar Pre-Trained Models (PTM) are becoming

Tencent 633 Dec 28, 2022
Localized representation learning from Vision and Text (LoVT)

Localized Vision-Text Pre-Training Contrastive learning has proven effective for pre- training image models on unlabeled data and achieved great resul

Philip Müller 10 Dec 07, 2022
The source code of CVPR17 'Generative Face Completion'.

GenerativeFaceCompletion Matcaffe implementation of our CVPR17 paper on face completion. In each panel from left to right: original face, masked input

Yijun Li 313 Oct 18, 2022
PyTorch implementation of Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy

Anomaly Transformer in PyTorch This is an implementation of Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy. This pape

spencerbraun 160 Dec 19, 2022
It's A ML based Web Site build with python and Django to find the breed of the dog

ML-Based-Dog-Breed-Identifier This is a Django Based Web Site To Identify the Breed of which your DOG belogs All You Need To Do is to Follow These Ste

Sanskar Dwivedi 2 Oct 12, 2022
SIEM Logstash parsing for more than hundred technologies

LogIndexer Pipeline Logstash Parsing Configurations for Elastisearch SIEM and OpenDistro for Elasticsearch SIEM Why this project exists The overhead o

146 Dec 29, 2022
[ICCV21] Code for RetrievalFuse: Neural 3D Scene Reconstruction with a Database

RetrievalFuse Paper | Project Page | Video RetrievalFuse: Neural 3D Scene Reconstruction with a Database Yawar Siddiqui, Justus Thies, Fangchang Ma, Q

Yawar Nihal Siddiqui 75 Dec 22, 2022
RLDS stands for Reinforcement Learning Datasets

RLDS RLDS stands for Reinforcement Learning Datasets and it is an ecosystem of tools to store, retrieve and manipulate episodic data in the context of

Google Research 135 Jan 01, 2023
Easily benchmark PyTorch model FLOPs, latency, throughput, max allocated memory and energy consumption

⏱ pytorch-benchmark Easily benchmark model inference FLOPs, latency, throughput, max allocated memory and energy consumption Install pip install pytor

Lukas Hedegaard 21 Dec 22, 2022
A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.

imutils A series of convenience functions to make basic image processing functions such as translation, rotation, resizing, skeletonization, and displ

Adrian Rosebrock 4.3k Jan 08, 2023
Github project for Attention-guided Temporal Coherent Video Object Matting.

Attention-guided Temporal Coherent Video Object Matting This is the Github project for our paper Attention-guided Temporal Coherent Video Object Matti

71 Dec 19, 2022
Classification Modeling: Probability of Default

Credit Risk Modeling in Python Introduction: If you've ever applied for a credit card or loan, you know that financial firms process your information

Aktham Momani 2 Nov 07, 2022
This Deep Learning Model Predicts that from which disease you are suffering.

Deep-Learning-Project This Deep Learning Model Predicts that from which disease you are suffering. This Project Covers the Topics of Deep Learning Int

Jai Viral Doshi 0 Jan 20, 2022
🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval 🏆 The 1st Place Submission to AICity Challenge 2021 Natural

82 Dec 29, 2022