Zsseg.baseline - Zero-Shot Semantic Segmentation

Overview

This repo is for our paper A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model. It is based on the official repo of MaskFormer.

@article{xu2021ss,
  title={End-to-End Semi-Supervised Object Detection with Soft Teacher},
  author={Xu, Mengde and Zhang, Zheng and Hu, Han and Wang, Jianfeng and Wang, Lijuan and Wei, Fangyun and Bai, Xiang and Liu, Zicheng},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021}
}

Guideline

  • Enviroment

    torch==1.8.0
    torchvision==0.9.0
    detectron2==0.5 #Following https://detectron2.readthedocs.io/en/latest/tutorials/install.html to install it and some required packages
    mmcv==1.3.14

    FurtherMore, install the modified clip package.

    cd third_party/CLIP
    python -m pip install -Ue .
  • Data Preparation

    In our experiments, four datasets are used. For Cityscapes and ADE20k, follow the tutorial in MaskFormer.

  • For COCO Stuff 164k:

    • Download data from the offical dataset website and extract it like below.
      Datasets/
           coco/
                #http://images.cocodataset.org/zips/train2017.zip
                train2017/ 
                #http://images.cocodataset.org/zips/val2017.zip
                val2017/   
                #http://images.cocodataset.org/annotations/annotations_trainval2017.zip
                annotations/ 
                #http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip
                stuffthingmaps/ 
    • Format the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.
      python datasets/prepare_coco_stuff_164k_sem_seg.py datasets/coco
      
      python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/train2017_base datasets/coco/stuffthingmaps_detectron2/train2017_base_label_count.pkl
      
      python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/val2017 datasets/coco/stuffthingmaps_detectron2/val2017_label_count.pkl
  • For Pascal VOC 11k:

    • Download data from the offical dataset website and extract it like below.
    datasets/
       VOC2012/
            #http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
            JPEGImages/
            val.txt
            #http://home.bharathh.info/pubs/codes/SBD/download.html
            SegmentationClassAug/
            #https://gist.githubusercontent.com/sun11/2dbda6b31acc7c6292d14a872d0c90b7/raw/5f5a5270089239ef2f6b65b1cc55208355b5acca/trainaug.txt
            train.txt
            
    • Format the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.
    python datasets/prepare_voc_sem_seg.py datasets/VOC2012
    
    python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/train datasets/VOC2012/annotations_detectron2/train_base_label_count.json
    
    python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/val datasets/VOC2012/annotations_detectron2/val_label_count.json
  • Training and Evaluation

    Before training and evaluation, see the tutorial in detectron2. For example, to training a zero shot semantic segmentation model on COCO Stuff:

  • Training with manually designed prompts:

    python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_single_prompt_bs32_60k.yaml
    
  • Training with learned prompts:

    # Training prompts
    python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_proposal_classification_learn_prompt_bs32_10k.yaml --num-gpus 8 
    # Training seg model
    python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_bs32_60k.yaml --num-gpus 8 MODEL.CLIP_ADAPTER.PROMPT_CHECKPOINT ${TRAINED_PROMPTS}

    Note: the prompts training will be affected by the random seed. It is better to run it multiple times.

    For evaluation, add --eval-only flag to the traing command.

  • Trained Model

    😄 Coming soon.

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

Hyunwoo Kim 51 Jan 06, 2023
AWS provides a Python SDK, "Boto3" ,which can be used to access the AWS-account from the local.

Boto3 - The AWS SDK for Python Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to wri

Shreyas Srivastava 1 Oct 25, 2021
A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

torchsynth The fastest synth in the universe. Introduction torchsynth is based upon traditional modular synthesis written in pytorch. It is GPU-option

torchsynth 229 Jan 02, 2023
Spatial Sparse Convolution Library

SpConv: Spatially Sparse Convolution Library PyPI Install Downloads CPU (Linux Only) pip install spconv CUDA 10.2 pip install spconv-cu102 CUDA 11.1 p

Yan Yan 1.2k Jan 07, 2023
HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electronic Health Records

HiPAL Code for KDD'22 Applied Data Science Track submission -- HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electro

Hanyang Liu 4 Aug 08, 2022
Making self-supervised learning work on molecules by using their 3D geometry to pre-train GNNs. Implemented in DGL and Pytorch Geometric.

3D Infomax improves GNNs for Molecular Property Prediction Video | Paper We pre-train GNNs to understand the geometry of molecules given only their 2D

Hannes Stärk 95 Dec 30, 2022
PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition [CVPR 2021].

Involution: Inverting the Inherence of Convolution for Visual Recognition Unofficial PyTorch reimplementation of the paper Involution: Inverting the I

Christoph Reich 100 Dec 01, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

9.6k Dec 31, 2022
DeepLab-ResNet rebuilt in TensorFlow

DeepLab-ResNet-TensorFlow This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset. Fr

Vladimir 1.2k Nov 04, 2022
Research code for the paper "Variational Gibbs inference for statistical estimation from incomplete data".

Variational Gibbs inference (VGI) This repository contains the research code for Simkus, V., Rhodes, B., Gutmann, M. U., 2021. Variational Gibbs infer

Vaidotas Å imkus 1 Apr 08, 2022
Repository for publicly available deep learning models developed in Rosetta community

trRosetta2 This package contains deep learning models and related scripts used by Baker group in CASP14. Installation Linux/Mac clone the package git

81 Dec 29, 2022
A cross-lingual COVID-19 fake news dataset

CrossFake An English-Chinese COVID-19 fake&real news dataset from the ICDMW 2021 paper below: Cross-lingual COVID-19 Fake News Detection. Jiangshu Du,

Yingtong Dou 11 Dec 01, 2022
PyContinual (An Easy and Extendible Framework for Continual Learning)

PyContinual (An Easy and Extendible Framework for Continual Learning) Easy to Use You can sumply change the baseline, backbone and task, and then read

176 Jan 05, 2023
PyTorch implementation of the ideas presented in the paper Interaction Grounded Learning (IGL)

Interaction Grounded Learning This repository contains a simple PyTorch implementation of the ideas presented in the paper Interaction Grounded Learni

Arthur Juliani 4 Aug 31, 2022
Pytorch Implementation of LNSNet for Superpixel Segmentation

LNSNet Overview Official implementation of Learning the Superpixel in a Non-iterative and Lifelong Manner (CVPR'21) Learning Strategy The proposed LNS

42 Oct 11, 2022
An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

CPC_audio This code implements the Contrast Predictive Coding algorithm on audio data, as described in the paper Unsupervised Pretraining Transfers we

Meta Research 283 Dec 30, 2022
Food Drinks and groceries Images Multi Lingual (FooDI-ML) dataset.

Food Drinks and groceries Images Multi Lingual (FooDI-ML) dataset.

41 Jan 04, 2023
GLM (General Language Model)

GLM GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language underst

THUDM 421 Jan 04, 2023
TensorFlow tutorials and best practices.

Effective TensorFlow 2 Table of Contents Part I: TensorFlow 2 Fundamentals TensorFlow 2 Basics Broadcasting the good and the ugly Take advantage of th

Vahid Kazemi 8.7k Dec 31, 2022
Deep learning models for classification of 15 common weeds in the southern U.S. cotton production systems.

CottonWeeds Deep learning models for classification of 15 common weeds in the southern U.S. cotton production systems. requirements pytorch torchsumma

Dong Chen 8 Jun 07, 2022