[CVPR 2021] MiVOS - Scribble to Mask module

Last update: Dec 22, 2022

Overview

MiVOS (CVPR 2021) - Scribble To Mask

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

A simplistic network that turns scribbles to mask. It supports multi-object segmentation using soft-aggregation. Don't expect SOTA results from this model!

Overall structure and capabilities

	MiVOS	Mask-Propagation	Scribble-to-Mask
DAVIS/YouTube semi-supervised evaluation	❌	✔️	❌
DAVIS interactive evaluation	✔️	❌	❌
User interaction GUI tool	✔️	❌	❌
Dense Correspondences	❌	✔️	❌
Train propagation module	❌	✔️	❌
Train S2M (interaction) module	❌	❌	✔️
Train fusion module	✔️	❌	❌
Generate more synthetic data	✔️	❌	❌

Requirements

The package versions shown here are the ones that I used. You might not need the exact versions.

PyTorch 1.6.0
torchvision 0.7.0
opencv-contrib 4.2.0
davis-interactive (https://github.com/albertomontesg/davis-interactive)
gitpython for training
gdown for downloading pretrained models

Refer to the official PyTorch guide for installing PyTorch/torchvision. The rest can be installed by:

pip install opencv-contrib-python gitpython gdown

Pretrained model

Download and put the model in ./saves/. Alternatively use the provided download_model.py.

[OneDrive Mirror]

Interactive GUI

python interactive.py --image <image>

Controls:

Mouse Left - Draw scribbles
Mouse middle key - Switch positive/negative
Key f - Commit changes, clear scribbles
Key r - Clear everything
Key d - Switch between overlay/mask view
Key s - Save masks into a temporary output folder (./output/)

Known issues

The model almost always needs to focus on at least one object. It is very difficult to erase all existing masks from an image using scribbles.

Training

Datasets

Download and extract LVIS training set.
Download and extract a set of static image segmentation datasets. These are already downloaded for you if you used the download_datasets.py in Mask-Propagation.

├── lvis
│   ├── lvis_v1_train.json
│   └── train2017
├── Scribble-to-Mask
└── static
    ├── BIG_small
    └── ...

Commands

Use the deeplabv3plus_resnet50 pretrained model provided here.

CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=2 train.py --id s2m --load_deeplab <path_to_deeplab.pth>

Credit

Deeplab implementation and pretrained model: https://github.com/VainF/DeepLabV3Plus-Pytorch.

Citation

Please cite our paper if you find this repo useful!

@inproceedings{MiVOS_2021,
  title={Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion},
  author={Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2021}
}

Contact: [email protected]

Comments

AttributeError: Caught AttributeError in DataLoader worker process 0

Hello! I followed the instructions of the training command, it has thrown an error about AttributeError. I put the static folder outside this repository as you mentioned. It is confusing that I can use the same datasets for the pretraining propagation module, the train.py in Mask-Propagation works fine.

opened by xwhkkk 2
git.exc.InvalidGitRepositoryError when running train.py

Hello! I followed the instruction of the training command, but it has thrown an error about GitRepositoryError. I used command : CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 1842 --nproc_per_node=2 train.py --id s2m --load_deeplab ./deeplab_resnet50/best_deeplabv3plus_resnet50_voc_os16.pth, and I have 2 GPUs. Could you give me some suggestions?

opened by xwhkkk 2
About evaluation of the model

Hi,

thank you for the nice work.

I have a concern about the evaluation of the model. Because there is no validation set to pick the best model. It may has a potential overfitting problem. (Or what should the validation set for interactive segmentation look like? If there is a unified standard, it will be more helpful for everyone to compare their methods.)

In interactive object segmentation setting, is this setting popular? I am new here for the interactive segmentation. Wish to solve my concern, thank you.

opened by Limingxing00 2
Question about Local Control Strategy

A simple but practical segmentation tool! I've read your paper, and it says that local control strategy is used in S2M. However, I don't find the local control step in this code. Why don't you provide it in this tool? Will local control make significant difference to the performance?

opened by distillation-dcf 1
DeepLabv3 pre-trained models

Hello,

I wanted to mention that in order to train S2M from scratch, using the deeplabv3_resnet50 pre-trained model provided in this repo, returns the following error: KeyError: 'classifier.classifier.0.convs.0.0.weight. Meaning that the weights from this layer are not present in deeplabv3_resnet50. But using the deeplabv3plus_resnet50 from the same repo executes without errors.

Best!

opened by UndecidedBoy 1
saving error

Hello! Thanks for sharing your code. When I run python interactive.py and want to save the masks, appeared following error.

Could you give me some suggestions?

opened by xwhkkk 3
Fix simple issues and allow for cpu only use

I had to make some changes to be able to use the code on cpu only system and had troubles saving the mask from the interactive GUI and fixed it. Thanks for the great work.

opened by rami-alloush 3

Releases(1.0)

1.0(Mar 14, 2021)

Pretrained model
Source code(tar.gz)
Source code(zip)
s2m.pth(152.04 MB)

Owner

Rex Cheng

@ HKUST.

GitHub Repository https://hkchengrex.github.io/MiVOS/

The implementation of FOLD-R++ algorithm

FOLD-R-PP The implementation of FOLD-R++ algorithm. The target of FOLD-R++ algorithm is to learn an answer set program for a classification task. Inst

13 Dec 23, 2022

[ACM MM 2019 Oral] Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

Contents Cycle-In-Cycle GANs Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Acknowledgments Relat

67 Dec 14, 2022

OSLO: Open Source framework for Large-scale transformer Optimization

O S L O Open Source framework for Large-scale transformer Optimization What's New: December 21, 2021 Released OSLO 1.0. What is OSLO about? OSLO is a

280 Nov 24, 2022

FNet Implementation with TensorFlow & PyTorch

FNet Implementation with TensorFlow & PyTorch. TensorFlow & PyTorch implementation of the paper "FNet: Mixing Tokens with Fourier Transforms". Overvie

1 Feb 12, 2022

OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

🔥 DrugOOD 🔥 : OOD Dataset Curator and Benchmark for AI Aided Drug Discovery This is the official implementation of the DrugOOD project, this is the

108 Dec 17, 2022

Sequential model-based optimization with a `scipy.optimize` interface

Scikit-Optimize Scikit-Optimize, or skopt, is a simple and efficient library to minimize (very) expensive and noisy black-box functions. It implements

2.5k Jan 04, 2023

Learning to Map Large-scale Sparse Graphs on Memristive Crossbar

Release of AutoGMap:Learning to Map Large-scale Sparse Graphs on Memristive Crossbar For reproduction of our searched model, the Ubuntu OS is recommen

2 Aug 23, 2022

Dilated Convolution with Learnable Spacings PyTorch

Dilated-Convolution-with-Learnable-Spacings-PyTorch Ismail Khalfaoui Hassani Dilated Convolution with Learnable Spacings (abbreviated to DCLS) is a no

15 Dec 09, 2022

Build tensorflow keras model pipelines in a single line of code. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

deep_autoviml Build keras pipelines and models in a single line of code! Table of Contents Motivation How it works Technology Install Usage API Image

102 Dec 17, 2022

Code that accompanies the paper Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance

Semi-supervised Deep Kernel Learning This is the code that accompanies the paper Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data

58 Oct 26, 2022

[AI6122] Text Data Management & Processing

[AI6122] Text Data Management & Processing is an elective course of MSAI, SCSE, NTU, Singapore. The repository corresponds to the AI6122 of Semester 1, AY2021-2022, starting from 08/2021. The instruc

1 Jan 17, 2022

Pytorch implementation for "Implicit Semantic Response Alignment for Partial Domain Adaptation"

Implicit-Semantic-Response-Alignment Pytorch implementation for "Implicit Semantic Response Alignment for Partial Domain Adaptation" Prerequisites pyt

4 Dec 19, 2022

Code release for NeuS

NeuS We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inpu

813 Jan 04, 2023

The implement of papar "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization"

SIGIR2021-EGLN The implement of paper "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization" Neural graph based Col

15 Dec 27, 2022

Contrastive Learning Inverts the Data Generating Process

Official code to reproduce the results and data presented in the paper Contrastive Learning Inverts the Data Generating Process.

71 Nov 25, 2022

ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm

ManipulaTHOR: A Framework for Visual Object Manipulation Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha

65 Dec 30, 2022

The fastai book, published as Jupyter Notebooks

English / Spanish / Korean / Chinese / Bengali / Indonesian The fastai book These notebooks cover an introduction to deep learning, fastai, and PyTorc

17k Jan 07, 2023

LaneAF: Robust Multi-Lane Detection with Affinity Fields

LaneAF: Robust Multi-Lane Detection with Affinity Fields This repository contains Pytorch code for training and testing LaneAF lane detection models i

155 Dec 17, 2022

This is the official implementation of TrivialAugment and a mini-library for the application of multiple image augmentation strategies including RandAugment and TrivialAugment.

Trivial Augment This is the official implementation of TrivialAugment (https://arxiv.org/abs/2103.10158), as was used for the paper. TrivialAugment is

94 Dec 30, 2022

Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

NDQ: Learning Nearly Decomposable Value Functions with Communication Minimization Note This codebase accompanies paper Learning Nearly Decomposable Va

69 Nov 26, 2022