Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Last update: Apr 01, 2022

Related tags

Overview

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

By Qiang Zhou*, Zilong Huang*, Lichao Huang, Han Shen, Yongchao Gong, Chang Huang, Wenyu Liu, Xinggang Wang.(* means equal contribution)

This code is the implementation mainly for DAVIS 2017 dataset. For more detail, please refer to our paper.

Architecture

Overview of our proposed PTSNet for video object segmentation. OPN is designed for generating proposals of the interested objects and OTN aims to distinguish which one of the proposals is the best. Finally, DRSN does the final pixel level tracking(segmentation) task. Note in our implementation we couple OPN and OTN as a whole network, and spearate DRSN out under engineering consideration.

Usage

Preparation

Install PyTorch 1.0 and necessary libraries like opencv, PIL etc.

There are some native CUDA implementations, InPlace-ABN and MaskRCNN Operators, which must be compiled at the very start.

# Before you compile, you need to figure out several things:
# - The CUDA kernels supported by your GPU, here we use `sm_52`, `sm_61` and `sm_70` for NVIDIA Titan V.
# - `cuda` and `nvcc` paths in your operating system, which exist usually in `/usr/local/cuda` and `/usr/local/cuda/bin/nvcc` respectively.
# InPlace-ABN_0.4   (PyTorch 0.4)
cd model/inplace_ABN_0.4
bash build.sh
# OR you could choose the 1.0 version of inplace ABN.
# InPlace-ABN_1.0   (PyTorch 1.0)
cd model/inplace_ABN    # It is dynamically compiled when running (gcc > 4.9)

# MaskRCNN Operators (PyTorch 0.4)
cd coupled_otn_opn/tracking/maskrcnn/lib
bash make.sh

You can train PTSNet from scratch or just evaluate our pretrained model.

Train it from scratch, you need to download:

 # DRSN: wget "https://download.pytorch.org/models/resnet50-19c8e357.pth" -O drsn/init_models/resnet50-19c8e357.pth
 # OPN: wget "https://drive.google.com/open?id=1ma1fNmEvS9dJLOIcm1FRzYofVS_t3aI3" -O coupled_otn_opn/tracking/maskrcnn/data/X-152-32x8d-IN5k.pkl
 # If you want to use our pretrained OTN:
 #   wget https://drive.google.com/open?id=12bF1dRlEUZoQz3Qcr2WD3ojqNHzbCrjf, put it into `coupled_otn_opn/models/mdnet_davis_50cyche.pth`
 # Else please modify from py-MDNet(https://github.com/HyeonseobNam/py-MDNet) to train OTN on DAVIS by yourself.

If you want to use our pretrained model to do the evaluation, you need to download:

 # DRSN: https://drive.google.com/open?id=116yXnqX43BZ7kEgdzUhIeTSn1dbvcE2F, put it into `drsn/snapshots/drsn_yvos_10w_davis_3p5w.pth`
 # OPN: wget "https://drive.google.com/open?id=1ma1fNmEvS9dJLOIcm1FRzYofVS_t3aI3" -O coupled_otn_opn/tracking/maskrcnn/data/X-152-32x8d-IN5k.pkl
 # OTN: https://drive.google.com/open?id=12bF1dRlEUZoQz3Qcr2WD3ojqNHzbCrjf, put it into `coupled_otn_opn/models/mdnet_davis_50cycle.pth`

Dataset

YouTube-VOS: Download from YouTube-VOS, note we only need the training part(train_all_frames.zip), totally about 41G. Unzip, move and rename it to drsn/dataset/yvos.
DAVIS: Download from DAVIS, note we only need the 480p version(DAVIS-2017-trainval-480p.zip). Unzip, move and rename it to drsn/dataset/DAVIS/trainval and coupled_otn_opn/DAVIS/trainval. Here you need to make a subdirectory of trainval directory to store the dataset.

And make sure to put the files as the following structure:

.
├── drsn
│   ├── dataset
│   │   ├── DAVIS
│   │   │   └── trainval
│   │   │       ├── Annotations
│   │   │       ├── ImageSets
│   │   │       └── JPEGImages
│   │   └── yvos
│   │       └── train_all_frames
│   ├── init_model
│   │   └── resnet50-19c8e357.pth
│   └── snapshots
│       └── drsn_yvos_10w_davis_3p5w.pth
└── coupled_otn_opn
    ├── DAVIS
    │   └── trainval
    ├── models
    │   └── mdnet_davis_50cycle.pth
    └── tracking
        └── maskrcnn
            └── data
                └── X-152-32x8d-FPN-IN5k.pkl

Train and Evaluate

Firstly, check the directory of coupled_otn_opn and follow the README.md inside to generate our proposals. You can also skip this step for we have provided generated proposals in drsn/dataset/result_davis directory.
Secondly, enter drsn and check do_train_eval.sh to train and evaluate.
Finally, we also provide result masks by our PTSNet in result-masks-GoogleDrive. The quantitative results are measured by DAVIS official matlab toolbox.

	J Mean	F Mean	G Mean
Avg	71.6	77.7	74.7

Acknowledgment

The work was mainly done during an internship at Horizon Robotics.

Citing PTSNet

If you find PTSNet useful in your research, please consider citing:

@article{ptsnet2019,
        title={Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation},
        author={Zhou, Qiang and Huang, Zilong and Huang, Lichao and Han, Shen and Gong, Yongchao and Huang, Chang and Liu, Wenyu and Wang, Xinggang},
        journal = {arXiv preprint arXiv:1907.01203v2},
        year={2019}
        }

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Related tags

Overview

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Architecture

Usage

Preparation

Train and Evaluate

Acknowledgment

Citing PTSNet

Thanks to the Third Party Libs

Owner

Forest

This repository provides some of the code implemented and the data used for the work proposed in "A Cluster-Based Trip Prediction Graph Neural Network Model for Bike Sharing Systems".

Official code for the publication "HyFactor: Hydrogen-count labelled graph-based defactorization Autoencoder".

Nest Protect integration for Home Assistant. This will allow you to integrate your smoke, heat, co and occupancy status real-time in HA.

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

How the Deep Q-learning method works and discuss the new ideas that makes the algorithm work

A simple consistency training framework for semi-supervised image semantic segmentation

AWS provides a Python SDK, "Boto3" ,which can be used to access the AWS-account from the local.

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

Official Pytorch implementation for 2021 ICCV paper "Learning Motion Priors for 4D Human Body Capture in 3D Scenes" and trained models / data

A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Face Mask Detection System built with OpenCV, TensorFlow using Computer Vision concepts

Meta-learning for NLP

🙄 Difficult algorithm, Simple code.

PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

An elaborate and exhaustive paper list for Named Entity Recognition (NER)

This is the official implementation of our proposed SwinMR

Official PyTorch implementation of "Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble" (NeurIPS'21)

[ICCV' 21] "Unsupervised Point Cloud Pre-training via Occlusion Completion"

Cross-Modal Contrastive Learning for Text-to-Image Generation