Reviving Iterative Training with Mask Guidance for Interactive Segmentation

Overview

Reviving Iterative Training with Mask Guidance for Interactive Segmentation

Open In Colab The MIT License

drawing drawing

This repository provides the source code for training and testing state-of-the-art click-based interactive segmentation models with the official PyTorch implementation of the following paper:

Reviving Iterative Training with Mask Guidance for Interactive Segmentation
Konstantin Sofiiuk, Ilia Petrov, Anton Konushin
Samsung AI Center Moscow
https://arxiv.org/abs/2102.06583

Abstract: Recent works on click-based interactive segmentation have demonstrated state-of-the-art results by using various inference-time optimization schemes. These methods are considerably more computationally expensive compared to feedforward approaches, as they require performing backward passes through a network during inference and are hard to deploy on mobile frameworks that usually support only forward passes. In this paper, we extensively evaluate various design choices for interactive segmentation and discover that new state-of-the-art results can be obtained without any additional optimization schemes. Thus, we propose a simple feedforward model for click-based interactive segmentation that employs the segmentation masks from previous steps. It allows not only to segment an entirely new object, but also to start with an external mask and correct it. When analyzing the performance of models trained on different datasets, we observe that the choice of a training dataset greatly impacts the quality of interactive segmentation. We find that the models trained on a combination of COCO and LVIS with diverse and high-quality annotations show performance superior to all existing models.

Setting up an environment

This framework is built using Python 3.6 and relies on the PyTorch 1.4.0+. The following command installs all necessary packages:

pip3 install -r requirements.txt

You can also use our Dockerfile to build a container with the configured environment.

If you want to run training or testing, you must configure the paths to the datasets in config.yml.

Interactive Segmentation Demo

drawing

The GUI is based on TkInter library and its Python bindings. You can try our interactive demo with any of the provided models. Our scripts automatically detect the architecture of the loaded model, just specify the path to the corresponding checkpoint.

Examples of the script usage:

# This command runs interactive demo with HRNet18 ITER-M model from cfg.INTERACTIVE_MODELS_PATH on GPU with id=0
# --checkpoint can be relative to cfg.INTERACTIVE_MODELS_PATH or absolute path to the checkpoint
python3 demo.py --checkpoint=hrnet18_cocolvis_itermask_3p --gpu=0

# This command runs interactive demo with HRNet18 ITER-M model from /home/demo/isegm/weights/
# If you also do not have a lot of GPU memory, you can reduce --limit-longest-size (default=800)
python3 demo.py --checkpoint=/home/demo/fBRS/weights/hrnet18_cocolvis_itermask_3p --limit-longest-size=400

# You can try the demo in CPU only mode
python3 demo.py --checkpoint=hrnet18_cocolvis_itermask_3p --cpu
Running demo in docker
# activate xhost
xhost +
docker run -v "$PWD":/tmp/ \
           -v /tmp/.X11-unix:/tmp/.X11-unix \
           -e DISPLAY=$DISPLAY  \
           python3 demo.py --checkpoint resnet34_dh128_sbd --cpu

Controls:

Key Description
Left Mouse Button Place a positive click
Right Mouse Button Place a negative click
Scroll Wheel Zoom an image in and out
Right Mouse Button +
Move Mouse
Move an image
Space Finish the current object mask
Initializing the ITER-M models with an external segmentation mask

drawing

According to our paper, ITER-M models take an image, encoded user input, and a previous step mask as their input. Moreover, a user can initialize the model with an external mask before placing any clicks and correct this mask using the same interface. As it turns out, our models successfully handle this situation and make it possible to change the mask.

To initialize any ITER-M model with an external mask use the "Load mask" button in the menu bar.

Interactive segmentation options
  • ZoomIn (can be turned on/off using the checkbox)
    • Skip clicks - the number of clicks to skip before using ZoomIn.
    • Target size - ZoomIn crop is resized so its longer side matches this value (increase for large objects).
    • Expand ratio - object bbox is rescaled with this ratio before crop.
    • Fixed crop - ZoomIn crop is resized to (Target size, Target size).
  • BRS parameters (BRS type can be changed using the dropdown menu)
    • Network clicks - the number of first clicks that are included in the network's input. Subsequent clicks are processed only using BRS (NoBRS ignores this option).
    • L-BFGS-B max iterations - the maximum number of function evaluation for each step of optimization in BRS (increase for better accuracy and longer computational time for each click).
  • Visualisation parameters
    • Prediction threshold slider adjusts the threshold for binarization of probability map for the current object.
    • Alpha blending coefficient slider adjusts the intensity of all predicted masks.
    • Visualisation click radius slider adjusts the size of red and green dots depicting clicks.

Datasets

We train all our models on SBD and COCO+LVIS and evaluate them on GrabCut, Berkeley, DAVIS, SBD and PascalVOC. We also provide links to additional datasets: ADE20k and OpenImages, that are used in ablation study.

Dataset Description Download Link
ADE20k 22k images with 434k instances (total) official site
OpenImages 944k images with 2.6M instances (total) official site
MS COCO 118k images with 1.2M instances (train) official site
LVIS v1.0 100k images with 1.2M instances (total) official site
COCO+LVIS* 99k images with 1.5M instances (train) original LVIS images +
our combined annotations
SBD 8498 images with 20172 instances for (train)
2857 images with 6671 instances for (test)
official site
Grab Cut 50 images with one object each (test) GrabCut.zip (11 MB)
Berkeley 96 images with 100 instances (test) Berkeley.zip (7 MB)
DAVIS 345 images with one object each (test) DAVIS.zip (43 MB)
Pascal VOC 1449 images with 3417 instances (validation) official site
COCO_MVal 800 images with 800 instances (test) COCO_MVal.zip (127 MB)

Don't forget to change the paths to the datasets in config.yml after downloading and unpacking.

(*) To prepare COCO+LVIS, you need to download original LVIS v1.0, then download and unpack our pre-processed annotations that are obtained by combining COCO and LVIS dataset into the folder with LVIS v1.0.

Testing

Pretrained models

We provide pretrained models with different backbones for interactive segmentation.

You can find model weights and evaluation results in the tables below:

Train
Dataset
Model GrabCut Berkeley SBD DAVIS Pascal
VOC
COCO
MVal
NoC
85%
NoC
90%
NoC
90%
NoC
85%
NoC
90%
NoC
85%
NoC
90%
NoC
85%
NoC
90%
SBD HRNet18 IT-M
(38.8 MB)
1.76 2.04 3.22 3.39 5.43 4.94 6.71 2.51 4.39
COCO+
LVIS
HRNet18
(38.8 MB)
1.54 1.70 2.48 4.26 6.86 4.79 6.00 2.59 3.58
HRNet18s IT-M
(16.5 MB)
1.54 1.68 2.60 4.04 6.48 4.70 5.98 2.57 3.33
HRNet18 IT-M
(38.8 MB)
1.42 1.54 2.26 3.80 6.06 4.36 5.74 2.28 2.98
HRNet32 IT-M
(119 MB)
1.46 1.56 2.10 3.59 5.71 4.11 5.34 2.57 2.97

Evaluation

We provide the script to test all the presented models in all possible configurations on GrabCut, Berkeley, DAVIS, Pascal VOC, and SBD. To test a model, you should download its weights and put them in ./weights folder (you can change this path in the config.yml, see INTERACTIVE_MODELS_PATH variable). To test any of our models, just specify the path to the corresponding checkpoint. Our scripts automatically detect the architecture of the loaded model.

The following command runs the NoC evaluation on all test datasets (other options are displayed using '-h'):

python3 scripts/evaluate_model.py <brs-mode> --checkpoint=<checkpoint-name>

Examples of the script usage:

# This command evaluates HRNetV2-W18-C+OCR ITER-M model in NoBRS mode on all Datasets.
python3 scripts/evaluate_model.py NoBRS --checkpoint=hrnet18_cocolvis_itermask_3p

# This command evaluates HRNet-W18-C-Small-v2+OCR ITER-M model in f-BRS-B mode on all Datasets.
python3 scripts/evaluate_model.py f-BRS-B --checkpoint=hrnet18s_cocolvis_itermask_3p

# This command evaluates HRNetV2-W18-C+OCR ITER-M model in NoBRS mode on GrabCut and Berkeley datasets.
python3 scripts/evaluate_model.py NoBRS --checkpoint=hrnet18_cocolvis_itermask_3p --datasets=GrabCut,Berkeley

Jupyter notebook

You can also interactively experiment with our models using test_any_model.ipynb Jupyter notebook.

Training

We provide the scripts for training our models on the SBD dataset. You can start training with the following commands:

# ResNet-34 non-iterative baseline model
python3 train.py models/noniterative_baselines/r34_dh128_cocolvis.py --gpus=0 --workers=4 --exp-name=first-try

# HRNet-W18-C-Small-v2+OCR ITER-M model
python3 train.py models/iter_mask/hrnet18s_cocolvis_itermask_3p.py --gpus=0 --workers=4 --exp-name=first-try

# HRNetV2-W18-C+OCR ITER-M model
python3 train.py models/iter_mask/hrnet18_cocolvis_itermask_3p.py --gpus=0,1 --workers=6 --exp-name=first-try

# HRNetV2-W32-C+OCR ITER-M model
python3 train.py models/iter_mask/hrnet32_cocolvis_itermask_3p.py --gpus=0,1,2,3 --workers=12 --exp-name=first-try

For each experiment, a separate folder is created in the ./experiments with Tensorboard logs, text logs, visualization and checkpoints. You can specify another path in the config.yml (see EXPS_PATH variable).

Please note that we trained ResNet-34 and HRNet-18s on 1 GPU, HRNet-18 on 2 GPUs, HRNet-32 on 4 GPUs (we used Nvidia Tesla P40 for training). To train on a different GPU you should adjust the batch size using the command line argument --batch-size or change the default value in a model script.

We used the pre-trained HRNetV2 models from the official repository. If you want to train interactive segmentation with these models, you need to download the weights and specify the paths to them in config.yml.

License

The code is released under the MIT License. It is a short, permissive software license. Basically, you can do whatever you want as long as you include the original copyright and license notice in any copy of the software/source.

Citation

If you find this work is useful for your research, please cite our papers:

@article{reviving2021,
  title={Reviving Iterative Training with Mask Guidance for Interactive Segmentation},
  author={Konstantin Sofiiuk, Ilia Petrov, Anton Konushin},
  journal={arXiv preprint arXiv:2102.06583},
  year={2021}
}

@article{fbrs2020,
  title={f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation},
  author={Konstantin Sofiiuk, Ilia Petrov, Olga Barinova, Anton Konushin},
  journal={arXiv preprint arXiv:2001.10331},
  year={2020}
}
Comments
  • Training time

    Training time

    Hello,

    Great work and GitHub repository! I'm curious about the duration needed to complete training, was it a matter of days or weeks (for the Nvidia Tesla P40 GPUs you used)?

    Best!

    opened by UndecidedBoy 2
  • Little confusing on a result

    Little confusing on a result

    Happy to read your great work again.

    I am confusing about the result in Table 1 and Table 3. In Table 1, it shows that HRNet-18(Conv1S, Disk5) achieve 6.90 on DAVIS. In the first part of ablation studies(section 5.2), you claim that you use HRNet with the Conv1S input scheme and encode clicks with disks of radius 5 in all other experiments. But in Table 3, the DAVIS result of HRNet-18 trained on SBD is 7.17. I don't understand the difference between them.

    opened by kleinzcy 2
  • Using prior masks

    Using prior masks

    Hello!

    Thank you for the great work. I was wondering if you could:

    1. Point me to where in your code you utilize prior masks, I'm having trouble locating it.
    2. If it's not obvious in the code describe how you go about using past model predictions? Do you precompute these or are they thresholded and saved?

    Thank you!

    opened by joshmyersdean 1
  • Resolution of image

    Resolution of image

    Hi! Thanks for sharing amazing work. I have noticed that while using pre-trained model, it is not working as fine as your demo file works on high resolution images. But for normal resolution images it works as equally as shared demo file. Can you please help me.

    opened by iresusharma 1
  • use ade20k dataset

    use ade20k dataset

    Thanks for your job. I'm trying to train a model with ade20k, but I can't find the “annotations-object-segmentation.pkl” in the ade20k.py, line 69. Can you offer me this pkl file? thank you

    opened by Robert-Hopkins 0
  • Question: Providing points from external source - without mouse

    Question: Providing points from external source - without mouse

    Hi @ksofiyuk

    I am planning on extending your framework by allowing the definition of points via a different input source (e.g., spatial mouse) that will not have a direct connection to the PC where RITM is running. Could you please point me out in the code where you sample the clicks so I can analyze how I can put my own input instead of clicks? Thanks a lot for the help!

    Best, Matteo

    opened by matteopantano 0
  • Web Demo

    Web Demo

    Thank you for your contribution to the vision community. I just want to ask, how can I use your model for the web server? Do you have any web-based demos available? How can I use it for multi-class segmentation? I am looking for your kind response.

    opened by Ehteshamciitwah 0
  • a question about interactive segmentation

    a question about interactive segmentation

    Dear sofiiuk: We are very interested in your interactive segmentation method, but we have a question need your help. After we test your code with python, we find that this method can only segment one class type, can you give us some advise on multi-class segmentation? for example, when the user click different object, it can show its class type, such apple, car, and so on.

        Your Sincerely,
        Wang Zhipan,
    

    Sun-Yat sen university

    opened by wzp8023391 0
  • AttributeError: module 'albumentations.augmentations.functional' has no attribute 'resize'

    AttributeError: module 'albumentations.augmentations.functional' has no attribute 'resize'

    I've met this problem when trying to train the model. AttributeError: module 'albumentations.augmentations.functional' has no attribute 'resize' my albumentations's version is 1.2.1 image I‘ve actually check the functional.py in albumentations, there is no resize function....T.T what can I do to solce this ? QAQ

    opened by ermu2001 1
Owner
Visual Understanding Lab @ Samsung AI Center Moscow
Visual Understanding Lab @ Samsung AI Center Moscow
This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et al. 2020

README This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et a

Raghav 42 Dec 15, 2022
This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

The Neural Process Family This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CN

DeepMind 892 Dec 28, 2022
PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility

PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility Jae Yong Lee, Joseph DeGol, Chuhang Zou, Derek Hoiem Installation To install nece

31 Apr 19, 2022
QA-GNN: Question Answering using Language Models and Knowledge Graphs

QA-GNN: Question Answering using Language Models and Knowledge Graphs This repo provides the source code & data of our paper: QA-GNN: Reasoning with L

Michihiro Yasunaga 434 Jan 04, 2023
Torch implementation of "Enhanced Deep Residual Networks for Single Image Super-Resolution"

NTIRE2017 Super-resolution Challenge: SNU_CVLab Introduction This is our project repository for CVPR 2017 Workshop (2nd NTIRE). We, Team SNU_CVLab, (B

Bee Lim 625 Dec 30, 2022
GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

MTV-TSA: Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent Vectors with Two-scale Attentions. This is the official code release fo

owl 37 Dec 24, 2022
A PyTorch Implementation of "SINE: Scalable Incomplete Network Embedding" (ICDM 2018).

Scalable Incomplete Network Embedding ⠀⠀ A PyTorch implementation of Scalable Incomplete Network Embedding (ICDM 2018). Abstract Attributed network em

Benedek Rozemberczki 69 Sep 22, 2022
This repo will contain code to reproduce and build upon understanding transfer learning

What is being transferred in transfer learning? This repo contains the code for the following paper: Behnam Neyshabur*, Hanie Sedghi*, Chiyuan Zhang*.

4 Jun 16, 2021
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
Tutorials, assignments, and competitions for MIT Deep Learning related courses.

MIT Deep Learning This repository is a collection of tutorials for MIT Deep Learning courses. More added as courses progress. Tutorial: Deep Learning

Lex Fridman 9.5k Jan 07, 2023
MG-GCN: Scalable Multi-GPU GCN Training Framework

MG-GCN MG-GCN: multi-GPU GCN training framework. For more information, please read our paper. After cloning our repository, run git submodule update -

Translational Data Analytics (TDA) Lab @GaTech 6 Oct 24, 2022
Training PSPNet in Tensorflow. Reproduce the performance from the paper.

Training Reproduce of PSPNet. (Updated 2021/04/09. Authors of PSPNet have provided a Pytorch implementation for PSPNet and their new work with support

Li Xuhong 126 Jul 13, 2022
Python implementation of O-OFDMNet, a deep learning-based optical OFDM system,

O-OFDMNet This includes Python implementation of O-OFDMNet, a deep learning-based optical OFDM system, which uses neural networks for signal processin

Thien Luong 4 Sep 09, 2022
AgeGuesser: deep learning based age estimation system. Powered by EfficientNet and Yolov5

AgeGuesser AgeGuesser is an end-to-end, deep-learning based Age Estimation system, presented at the CAIP 2021 conference. You can find the related pap

5 Nov 10, 2022
FOSS Digital Asset Distribution Platform built on Frappe.

Digistore FOSS Digital Assets Marketplace. Distribute digital assets, like a pro. Video Demo Here Features Create, attach and list digital assets (PDF

Mohammad Hussain Nagaria 30 Dec 08, 2022
The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

46 Dec 07, 2022
This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong Poisons

Adversarial poison generation and evaluation. This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong

31 Nov 01, 2022
PyTorch implementation of SQN based on CloserLook3D's encoder

SQN_pytorch This repo is an implementation of Semantic Query Network (SQN) using CloserLook3D's encoder in Pytorch. For TensorFlow implementation, che

PointCloudYC 1 Oct 21, 2021
VisionKG: Vision Knowledge Graph

VisionKG: Vision Knowledge Graph Official Repository of VisionKG by Anh Le-Tuan, Trung-Kien Tran, Manh Nguyen-Duc, Jicheng Yuan, Manfred Hauswirth and

Continuous Query Evaluation over Linked Stream (CQELS) 9 Jun 23, 2022
GAN example for Keras. Cuz MNIST is too small and there should be something more realistic.

Keras-GAN-Animeface-Character GAN example for Keras. Cuz MNIST is too small and there should an example on something more realistic. Some results Trai

160 Sep 20, 2022