Weakly- and Semi-Supervised Panoptic Segmentation (ECCV18)

Overview

Weakly- and Semi-Supervised Panoptic Segmentation

by Qizhu Li*, Anurag Arnab*, Philip H.S. Torr

This repository demonstrates the weakly supervised ground truth generation scheme presented in our paper Weakly- and Semi-Supervised Panoptic Segmentation published at ECCV 2018. The code has been cleaned-up and refactored, and should reproduce the results presented in the paper.

For details, please refer to our paper, and project page. Please check the Downloads section for all the additional data we release.

Summary

* Equal first authorship

Introduction

In our weakly-supervised panoptic segmentation experiments, our models are supervised by 1) image-level tags and 2) bounding boxes, as shown in the figure above. We used image-level tags as supervision for "stuff" classes which do not have a defined extent and cannot be described well by tight bounding boxes. For "thing" classes, we used bounding boxes as our weak supervision. This code release clarifies the implementation details of the method presented in the paper.

Iterative ground truth generation

For readers' convenience, we will give an outline of the proposed iterative ground truth generation pipeline, and provide demos for some of the key steps.

  1. We train a multi-class classifier for all classes to obtain rough localisation cues. As it is not possible to fit an entire Cityscapes image (1024x2048) into a network due to GPU memory constraints, we took 15 fixed 400x500 crops per training image, and derived their classification ground truth accordingly, which we use to train the multi-class classifier. From the trained classifier, we extract the Class Activation Maps (CAMs) using Grad-CAM, which has the advantage of being agnostic to network architecture over CAM.

    • Download the fixed image crops with image-level tags here to train your own classifier. For convenience, the pixel-level semantic label of the crops are also included, though they should not be used in training.
    • The CAMs we produced are available for download here.
  2. In parallel, we extract bounding box annotations from Cityscapes ground truth files, and then run MCG (a segment-proposal algorithm) and Grabcut (a classic foreground segmentation technique given a bounding-box prior) on the training images to generate foreground masks inside each annotated bounding box. MCG and Grabcut masks are merged following the rule that only regions where both have consensus are given the predicted label; otherwise an "ignore" label is assigned.

    • The extracted bounding boxes (saved in .mat format) can be downloaded here. Alternatively, we also provide a demo script demo_instanceTrainId_to_dets.m and a batch script batch_instanceTrainId_to_dets.m for you to make them yourself. The demo is self-contained; However, before running the batch script, make sure to
      1. Download the official Cityscapes scripts repository;

      2. Inside the above repository, navigate to cityscapesscripts/preparation and run

        python createTrainIdInstanceImgs.py

        This command requires an environment variable CITYSCAPES_DATASTET=path/to/your/cityscapes/data/folder to be set. These two steps produce the *_instanceTrainIds.png files required by our batch script;

      3. Navigate back to this repository, and place/symlink your gtFine and gtCoarse folders inside data/Cityscapes/ folder so that they are visible to our batch script.

    • Please see here for details on MCG.
    • We use the OpenCV implementation of Grabcut in our experiments.
    • The merged M&G masks we produced are available for download here.
  3. The CAMs (step 1) and M&G masks (step 2) are merged to produce the ground truth needed to kick off iterative training. To see a demo of merging, navigate to the root folder of this repo in MATLAB and run:

     demo_merge_cam_mandg;

    When post-processing network predictions of images from the Cityscapes train_extra split, make sure to use the following settings:

    opts.run_apply_bbox_prior = false;
    opts.run_check_image_level_tags = false;
    opts.save_ins = false;

    because the coarse annotation provided on the train_extra split trades off recall for precision, leading to inaccurate bounding box coordinates, and frequent occurrences of false negatives. This also applies to step 5.

    • The results from merging CAMs with M&G masks can be downloaded here.
  4. Using the generated ground truth, weakly-supervised models can be trained in the same way as a fully-supervised model. When the training loss converges, we make dense predictions using the model and also save the prediction scores.

    • An example of dense prediction made by a weakly-supervised model is included at results/pred_sem_raw/, and an example of the corresponding prediction scores is provided at results/pred_flat_feat/.
  5. The prediction and prediction scores (and optionally, the M&G masks) are used to generate the ground truth labels for next stage of iterative training. To see a demo of iterative ground truth generation, navigate to the root folder of this repo in MATLAB and run:

    demo_make_iterative_gt;

    The generated semantic and instance ground truth labels are saved at results/pred_sem_clean and results/pred_ins_clean respectively.

    Please refer to scripts/get_opts.m for the options available. To reproduce the results presented in the paper, use the default setting, and set opts.run_merge_with_mcg_and_grabcut to false after five iterations of training, as the weakly supervised model by then produces better quality segmentation of ''thing'' classes than the original M&G masks.

  6. Repeat step 4 and 5 until training loss no longer reduces.

Downloads

  1. Image crops and tags for training multi-class classifier:
  2. CAMs:
  3. Extracted Cityscapes bounding boxes (.mat format):
  4. Merged MCG&Grabcut masks:
  5. CAMs merged with MCG&Grabcut masks:

Note that due to file size limit set by BaiduYun, some of the larger files had to be split into several chunks in order to be uploaded. These files are named as filename.zip.part##, where filename is the original file name excluding the extension, and ## is a two digit part index. After you have downloaded all the parts, cd to the folder where they are saved, and use the following command to join them back together:

cat filename.zip.part* > filename.zip

The joining operation may take several minutes, depending on file size.

The above does not apply to files downloaded from Dropbox.

Reference

If you find the code helpful in your research, please cite our paper:

@InProceedings{Li_2018_ECCV,
    author = {Li, Qizhu and 
              Arnab, Anurag and 
              Torr, Philip H.S.},
    title = {Weakly- and Semi-Supervised Panoptic Segmentation},
    booktitle = {The European Conference on Computer Vision (ECCV)},
    month = {September},
    year = {2018}
}

Questions

Please contact Qizhu Li [email protected] and Anurag Arnab [email protected] for enquires, issues, and suggestions.

Owner
Qizhu Li
Capable of living on land, but prefers to stay in water.
Qizhu Li
A deep learning framework for historical document image analysis

DIVA-DAF Description A deep learning framework for historical document image analysis. How to run Install dependencies # clone project git clone https

9 Aug 04, 2022
Tiny Kinetics-400 for test

Kinetics-400迷你数据集 English | 简体中文 该数据集旨在解决的问题:参照Kinetics-400数据格式,训练基于自己数据的视频理解模型。 数据集介绍 Kinetics-400是视频领域benchmark常用数据集,详细介绍可以参考其官方网站Kinetics。整个数据集包含40

38 Jan 06, 2023
functorch is a prototype of JAX-like composable function transforms for PyTorch.

functorch is a prototype of JAX-like composable function transforms for PyTorch.

Facebook Research 1.2k Jan 09, 2023
Multi-objective gym environments for reinforcement learning.

MO-Gym: Multi-Objective Reinforcement Learning Environments Gym environments for multi-objective reinforcement learning (MORL). The environments follo

Lucas Alegre 74 Jan 03, 2023
A tensorflow implementation of an HMM layer

tensorflow_hmm Tensorflow and numpy implementations of the HMM viterbi and forward/backward algorithms. See Keras example for an example of how to use

Zach Dwiel 283 Oct 19, 2022
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition [ArXiv] [Project Page] This repository is the official implementation of AdaMML:

International Business Machines 43 Dec 26, 2022
ML powered analytics engine for outlier detection and root cause analysis.

Website • Docs • Blog • LinkedIn • Community Slack ML powered analytics engine for outlier detection and root cause analysis ✨ What is Chaos Genius? C

Chaos Genius 523 Jan 04, 2023
Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite.

TFlite Ultra Fast Lane Detection Inference Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite. So

Ibai Gorordo 12 Aug 27, 2022
The AugNet Python module contains functions for the fast computation of image similarity.

AugNet AugNet: End-to-End Unsupervised Visual Representation Learning with Image Augmentation arxiv link In our work, we propose AugNet, a new deep le

Ming 74 Dec 28, 2022
Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically.

Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically. The collected data will then be used to train a deep neural network that can

Martin Valchev 3 Apr 24, 2022
lightweight python wrapper for vowpal wabbit

vowpal_porpoise Lightweight python wrapper for vowpal_wabbit. Why: Scalable, blazingly fast machine learning. Install Install vowpal_wabbit. Clone and

Joseph Reisinger 163 Nov 24, 2022
RoFormer_pytorch

PyTorch RoFormer 原版Tensorflow权重(https://github.com/ZhuiyiTechnology/roformer) chinese_roformer_L-12_H-768_A-12.zip (提取码:xy9x) 已经转化为PyTorch权重 chinese_r

yujun 283 Dec 12, 2022
DeepFaceLab fork which provides IPython Notebook to use DFL with Google Colab

DFL-Colab — DeepFaceLab fork for Google Colab This project provides you IPython Notebook to use DeepFaceLab with Google Colaboratory. You can create y

779 Jan 05, 2023
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Salesforce 1.3k Dec 31, 2022
Explainable Medical ImageSegmentation via GenerativeAdversarial Networks andLayer-wise Relevance Propagation

MedAI: Transparency in Medical Image Segmentation What is this repo This repo contains the code and experiments that are implemented to contribute in

Awadelrahman M. A. Ahmed 1 Nov 22, 2021
Official PyTorch implementation of BlobGAN: Spatially Disentangled Scene Representations

BlobGAN: Spatially Disentangled Scene Representations Official PyTorch Implementation Paper | Project Page | Video | Interactive Demo BlobGAN.mp4 This

148 Dec 29, 2022
Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

DRRN-pytorch This is an unofficial implementation of "Deep Recursive Residual Network for Super Resolution (DRRN)", CVPR 2017 in Pytorch. [Paper] You

yun_yang 192 Dec 12, 2022
Implementation of "Semi-supervised Domain Adaptive Structure Learning"

Semi-supervised Domain Adaptive Structure Learning - ASDA This repo contains the source code and dataset for our ASDA paper. Illustration of the propo

3 Dec 13, 2021
Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

Exploring Classification Equilibrium in Long-Tailed Object Detection (LOCE, ICCV 2021) Paper Introduction The conventional detectors tend to make imba

52 Nov 21, 2022
Official PyTorch implementation of "Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics".

Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics This repository is the official PyTorch implementation of "Physics-aware Differ

USC-Melady 46 Nov 20, 2022