Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Last update: Dec 23, 2022

Overview

Vision-Language Transformer and Query Generation for Referring Segmentation

Please consider citing our paper in your publications if the project helps your research.

@inproceedings{vision-language-transformer,
  title={Vision-Language Transformer and Query Generation for Referring Segmentation},
  author={Ding, Henghui and Liu, Chang and Wang, Suchen and Jiang, Xudong},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Installation

Environment:
- Python 3.6
- tensorflow 1.15
- Other dependencies in requirements.txt
- SpaCy model for embedding:
  
  python -m spacy download en_vectors_web_lg
Dataset preparation
- Put the folder of COCO training set ("train2014") under data/images/.
- Download the RefCOCO dataset from here and extract them to data/. Then run the script for data preparation under data/:
```
cd data
python data_process_v2.py --data_root . --output_dir data_v2 --dataset [refcoco/refcoco+/refcocog] --split [unc/umd/google] --generate_mask
```

Evaluating

Download pretrained models & config files from here.
In the config file, set:
- evaluate_model: path to the pretrained weights
- evaluate_set: path to the dataset for evaluation.

Run

python vlt.py test [PATH_TO_CONFIG_FILE]

Training

Pretrained Backbones: We use the backbone weights proviede by MCN.

Note: we use the backbone that excludes all images that appears in the val/test splits of RefCOCO, RefCOCO+ and RefCOCOg.
Specify hyperparameters, dataset path and pretrained weight path in the configuration file. Please refer to the examples under /config, or config file of our pretrained models.

Run

python vlt.py train [PATH_TO_CONFIG_FILE]

Acknowledgement

We borrowed a lot of codes from MCN, keras-transformer, RefCOCO API and keras-yolo3. Thanks for their excellent works!

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Related tags

Overview

Vision-Language Transformer and Query Generation for Referring Segmentation

Installation

Evaluating

Training

Acknowledgement

Owner

Henghui Ding

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Official code for 'Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentationon Complex Urban Driving Scenes'

Ranger deep learning optimizer rewrite to use newest components

AdamW optimizer and cosine learning rate annealing with restarts

Official implementation of "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers"

PyG (PyTorch Geometric) - A library built upon PyTorch to easily write and train Graph Neural Networks (GNNs)

An Industrial Grade Federated Learning Framework

a spacial-temporal pattern detection system for home automation

ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin et al., 2020).

Physics-informed Neural Operator for Learning Partial Differential Equation

a dnn ai project to classify which food people are eating on audio recordings

Face Mask Detection on Image and Video using tensorflow and keras

Official PyTorch code for the paper: "Point-Based Modeling of Human Clothing" (ICCV 2021)

PlenOctree Extraction algorithm

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Code for TIP 2017 paper --- Illumination Decomposition for Photograph with Multiple Light Sources.

Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

A simple, fast, and efficient object detector without FPN

ProMP: Proximal Meta-Policy Search