This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Last update: Dec 30, 2022

Related tags

Deep Learning clipseg

Overview

Prompt-Based Multi-Modal Image Segmentation

This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

The systems allows to create segmentation models without training based on:

An arbitrary text query
Or an image with a mask highlighting stuff or an object.

Quick Start

In the Quickstart.ipynb notebook we provide the code for using a pre-trained CLIPSeg model. It can also be used interactively using MyBinder (please note that the VM does not use a GPU, thus inference takes a few seconds).

Dependencies

This code base depends on pytorch, torchvision and clip (pip install git+https://github.com/openai/CLIP.git). Additional dependencies are hidden for double blind review.

Datasets

PhraseCut and PhraseCutPlus: Referring expression dataset
PFEPascalWrapper: Wrapper class for PFENet's Pascal-5i implementation
PascalZeroShot: Wrapper class for PascalZeroShot
COCOWrapper: Wrapper class for COCO.

Models

CLIPDensePredT: CLIPSeg model with transformer-based decoder.
ViTDensePredT: CLIPSeg model with transformer-based decoder.

Third Party Dependencies

For some of the datasets third party dependencies are required. Run the following commands in the third_party folder.

git clone https://github.com/cvlab-yonsei/JoEm
git clone https://github.com/Jia-Research-Lab/PFENet.git
git clone https://github.com/ChenyunWu/PhraseCutDataset.git
git clone https://github.com/juhongm999/hsnet.git

Weights

CLIPSeg-D64 (4.1MB, without CLIP weights)
CLIPSeg-D16 (1.1MB, without CLIP weights)

Training

See the experiment folder for yaml definitions of the training configurations. The training code is in experiment_setup.py.

Usage of PFENet Wrappers

In order to use the dataset and model wrappers for PFENet, the PFENet repository needs to be cloned to the root folder. git clone https://github.com/Jia-Research-Lab/PFENet.git

Citation

@article{lueddecke21
    title={Prompt-Based Multi-Modal Image Segmentation},
    author={Timo Lüddecke and Alexander Ecker},
    journal={arXiv preprint arXiv:2112.10003},
    year={2021}
}

This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Related tags

Overview

Prompt-Based Multi-Modal Image Segmentation

Quick Start

Dependencies

Datasets

Models

Third Party Dependencies

Weights

Training

Usage of PFENet Wrappers

Citation

Owner

Timo Lüddecke

PantheonRL is a package for training and testing multi-agent reinforcement learning environments.

METER: Multimodal End-to-end TransformER

NLMpy - A Python package to create neutral landscape models

Data Preparation, Processing, and Visualization for MoVi Data

Pre-Training 3D Point Cloud Transformers with Masked Point Modeling

Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

🔅 Shapash makes Machine Learning models transparent and understandable by everyone

Parametric Contrastive Learning (ICCV2021)

[NeurIPS 2020] This project provides a strong single-stage baseline for Long-Tailed Classification, Detection, and Instance Segmentation (LVIS).

Testbed of AI Systems Quality Management

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

Chinese named entity recognization with BiLSTM using Keras

The pytorch implementation of DG-Font: Deformable Generative Networks for Unsupervised Font Generation

Anomaly detection in multi-agent trajectories: Code for training, evaluation and the OpenAI highway simulation.

2021:"Bridging Global Context Interactions for High-Fidelity Image Completion"

TLDR: Twin Learning for Dimensionality Reduction

YOLOv4-v3 Training Automation API for Linux

Global-Local Attention for Emotion Recognition

Toward Spatially Unbiased Generative Models (ICCV 2021)