Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Last update: Dec 14, 2022

Related tags

Overview

RMNet

This repository contains the source code for the paper Efficient Regional Memory Network for Video Object Segmentation.

Cite this work

@inproceedings{xie2021efficient,
  title={Efficient Regional Memory Network for Video Object Segmentation},
  author={Xie, Haozhe and 
          Yao, Hongxun and 
          Zhou, Shangchen and 
          Zhang, Shengping and 
          Sun, Wenxiu},
  booktitle={CVPR},
  year={2021}
}

Datasets

We use the ECSSD, COCO, PASCAL VOC, MSRA10K, DAVIS, and YouTube-VOS datasets in our experiments, which are available below:

Pretrained Models

The pretrained models for DAVIS and YouTube-VOS are available as follows:

RMNet for DAVIS (202 MB)
RMNet for YouTube-VOS (202 MB)

Prerequisites

Clone the Code Repository

git clone https://github.com/hzxie/RMNet.git

Install Python Denpendencies

cd RMNet
pip install -r requirements.txt

Build PyTorch Extensions

NOTE: PyTorch >= 1.4, CUDA >= 9.0 and GCC >= 4.9 are required.

RMNET_HOME=`pwd`

cd $RMNET_HOME/extensions/reg_att_map_generator
python setup.py install --user

cd $RMNET_HOME/extensions/flow_affine_transformation
python setup.py install --user

Precompute the Optical Flow

For the DAVIS dataset, the optical flows are computed by FlowNet2-CSS with the model pretrained on FlyingThings3D.
For the YouTube-VOS dataset, the optical flows are computed by RAFT with the model pretrained on Sintel.

Update Settings in `config.py`

You need to update the file path of the datasets:

__C.DATASETS                                     = edict()
__C.DATASETS.DAVIS                               = edict()
__C.DATASETS.DAVIS.INDEXING_FILE_PATH            = './datasets/DAVIS.json'
__C.DATASETS.DAVIS.IMG_FILE_PATH                 = '/path/to/Datasets/DAVIS/JPEGImages/480p/%s/%05d.jpg'
__C.DATASETS.DAVIS.ANNOTATION_FILE_PATH          = '/path/to/Datasets/DAVIS/Annotations/480p/%s/%05d.png'
__C.DATASETS.DAVIS.OPTICAL_FLOW_FILE_PATH        = '/path/to/Datasets/DAVIS/OpticalFlows/480p/%s/%05d.flo'
__C.DATASETS.YOUTUBE_VOS                         = edict()
__C.DATASETS.YOUTUBE_VOS.INDEXING_FILE_PATH      = '/path/to/Datasets/YouTubeVOS/%s/meta.json'
__C.DATASETS.YOUTUBE_VOS.IMG_FILE_PATH           = '/path/to/Datasets/YouTubeVOS/%s/JPEGImages/%s/%s.jpg'
__C.DATASETS.YOUTUBE_VOS.ANNOTATION_FILE_PATH    = '/path/to/Datasets/YouTubeVOS/%s/Annotations/%s/%s.png'
__C.DATASETS.YOUTUBE_VOS.OPTICAL_FLOW_FILE_PATH  = '/path/to/Datasets/YouTubeVOS/%s/OpticalFlows/%s/%s.flo'
__C.DATASETS.PASCAL_VOC                          = edict()
__C.DATASETS.PASCAL_VOC.INDEXING_FILE_PATH       = '/path/to/Datasets/voc2012/trainval.txt'
__C.DATASETS.PASCAL_VOC.IMG_FILE_PATH            = '/path/to/Datasets/voc2012/images/%s.jpg'
__C.DATASETS.PASCAL_VOC.ANNOTATION_FILE_PATH     = '/path/to/Datasets/voc2012/masks/%s.png'
__C.DATASETS.ECSSD                               = edict()
__C.DATASETS.ECSSD.N_IMAGES                      = 1000
__C.DATASETS.ECSSD.IMG_FILE_PATH                 = '/path/to/Datasets/ecssd/images/%s.jpg'
__C.DATASETS.ECSSD.ANNOTATION_FILE_PATH          = '/path/to/Datasets/ecssd/masks/%s.png'
__C.DATASETS.MSRA10K                             = edict()
__C.DATASETS.MSRA10K.INDEXING_FILE_PATH          = './datasets/msra10k.txt'
__C.DATASETS.MSRA10K.IMG_FILE_PATH               = '/path/to/Datasets/msra10k/images/%s.jpg'
__C.DATASETS.MSRA10K.ANNOTATION_FILE_PATH        = '/path/to/Datasets/msra10k/masks/%s.png'
__C.DATASETS.MSCOCO                              = edict()
__C.DATASETS.MSCOCO.INDEXING_FILE_PATH           = './datasets/mscoco.txt'
__C.DATASETS.MSCOCO.IMG_FILE_PATH                = '/path/to/Datasets/coco2017/images/train2017/%s.jpg'
__C.DATASETS.MSCOCO.ANNOTATION_FILE_PATH         = '/path/to/Datasets/coco2017/masks/train2017/%s.png'
__C.DATASETS.ADE20K                              = edict()
__C.DATASETS.ADE20K.INDEXING_FILE_PATH           = './datasets/ade20k.txt'
__C.DATASETS.ADE20K.IMG_FILE_PATH                = '/path/to/Datasets/ADE20K_2016_07_26/images/training/%s.jpg'
__C.DATASETS.ADE20K.ANNOTATION_FILE_PATH         = '/path/to/Datasets/ADE20K_2016_07_26/images/training/%s_seg.png'

# Dataset Options: DAVIS, DAVIS_FRAMES, YOUTUBE_VOS, ECSSD, MSCOCO, PASCAL_VOC, MSRA10K, ADE20K
__C.DATASET.TRAIN_DATASET                        = ['ECSSD', 'PASCAL_VOC', 'MSRA10K', 'MSCOCO']  # Pretrain
__C.DATASET.TRAIN_DATASET                        = ['YOUTUBE_VOS', 'DAVISx5']                    # Fine-tune
__C.DATASET.TEST_DATASET                         = 'DAVIS'

# Network Options: RMNet, TinyFlowNet
__C.TRAIN.NETWORK                                = 'RMNet'

Get Started

To train RMNet, you can simply use the following command:

python3 runner.py

To test RMNet, you can use the following command:

python3 runner.py --test --weights=/path/to/pretrained/model.pth

License

This project is open sourced under MIT license.

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Related tags

Overview

RMNet

Cite this work

Datasets

Pretrained Models

Prerequisites

Clone the Code Repository

Install Python Denpendencies

Build PyTorch Extensions

Precompute the Optical Flow

Update Settings in `config.py`

Get Started

License

Owner

Haozhe Xie

CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

GoodNews Everyone! Context driven entity aware captioning for news images

Neural Radiance Fields Using PyTorch

DivNoising is an unsupervised denoising method to generate diverse denoised samples for any noisy input image. This repository contains the code to reproduce the results reported in the paper https://openreview.net/pdf?id=agHLCOBM5jP

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

Learnable Motion Coherence for Correspondence Pruning

Automatic tool focused on deriving metallicities of open clusters

AI创造营：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

Learning Neural Network Subspaces

Kaggle | 9th place single model solution for TGS Salt Identification Challenge

A mini lib that implements several useful functions binding to PyTorch in C++.

Code for the preprint "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"

Official Pytorch implementation of 'RoI Tanh-polar Transformer Network for Face Parsing in the Wild.'

Codes to calculate solar-sensor zenith and azimuth angles directly from hyperspectral images collected by UAV. Works only for UAVs that have high resolution GNSS/IMU unit.

Syllabus del curso IIC2115 - Programación como Herramienta para la Ingeniería 2022/I

Testing and Estimation of structural breaks in Stata

Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.

Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022)

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Related tags

Overview

RMNet

Cite this work

Datasets

Pretrained Models

Prerequisites

Clone the Code Repository

Install Python Denpendencies

Build PyTorch Extensions

Precompute the Optical Flow

Update Settings in config.py

Get Started

License

Owner

Haozhe Xie

CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

GoodNews Everyone! Context driven entity aware captioning for news images

Neural Radiance Fields Using PyTorch

DivNoising is an unsupervised denoising method to generate diverse denoised samples for any noisy input image. This repository contains the code to reproduce the results reported in the paper https://openreview.net/pdf?id=agHLCOBM5jP

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

Learnable Motion Coherence for Correspondence Pruning

Automatic tool focused on deriving metallicities of open clusters

AI创造营 ：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

Learning Neural Network Subspaces

Kaggle | 9th place single model solution for TGS Salt Identification Challenge

A mini lib that implements several useful functions binding to PyTorch in C++.

Code for the preprint "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"

Official Pytorch implementation of 'RoI Tanh-polar Transformer Network for Face Parsing in the Wild.'

Codes to calculate solar-sensor zenith and azimuth angles directly from hyperspectral images collected by UAV. Works only for UAVs that have high resolution GNSS/IMU unit.

Syllabus del curso IIC2115 - Programación como Herramienta para la Ingeniería 2022/I

Testing and Estimation of structural breaks in Stata

Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.

Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022)

Update Settings in `config.py`

AI创造营：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人