Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

Last update: Dec 23, 2022

Related tags

Overview

CLIORA

This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling.

We introduce a new task of Unsupervised Vision-Language Grammar Induction and devise a model Contrastive Language-Image inside-Outside Recursive Autoencoder (CLIORA) to solve it. Please read our paper for more details: https://openreview.net/forum?id=N0n_QyQ5lBF.

This code follows the implementation architecture of DIORA.

Please cite our paper as follows:

@inproceedings{wan2022cliora,
  title={Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling},
  author={Wan, Bo and Han, Wenjuan and Zheng, Zilong and Tuytelaars, Tinne},
  booktitle={The International Conference on Learning Representations (ICLR)},
  year={2022},
}

Envs and Datas

Install dependencies (using Conda as a virtual environment):

conda create -n cliora python=3.8
source activate cliora
pip install -r requirements.txt

Download flickr_data and outputs and put the files as the following structure:

  cliora
  ├───cliora
  │   ├─...
  │
  ├───flickr_data
  │   ├─flickr_feat_maf
  │
  ├───outputs
      ├─flickr

We use the same object features as MAF. Download train_features_compress.hdf5, val features_compress.hdf5, test features_compress.hdf5 to flickr_data/flickr_feat_maf.

Running CLIORA

export PYTHONPATH=$(pwd):$PYTHONPATH


## Train DIORA
sh train_diora.sh

## Test DIORA
sh test_diora.sh

## Train CLOIRA based on DIORA
sh train_clora.sh

## Test CLIORA 
sh test_cliora.sh

Multi-GPU Training

Single-GPU training:

export CUDA_VISIBLE_DEVICES=0
python -m cliora/scripts/train.py
    --cuda
    ... # other args

Multi-GPU Training:

export CUDA_VISIBLE_DEVICES=0,1,2,3
export NGPUS=4
python -m torch.distributed.launch --nproc_per_node=$NGPUS cliora/scripts/train.py
    --cuda
    --multigpu
    ... # other args

Visualization

Download Flickr30K Entities Dataset and put the image folder flickr_images under flickr_data/. Add --visualize when run test_cliora.sh:

# test_cliora.sh
python cliora/scripts/parse.py
    --cuda
    --visualize
    --obj_feats
    ... # other args

Word Embedding

We provide randomly-initialized word embedding, skip-thoughts embedding and ELMo embedding. If you use ELMo embedding and specify the --elmo_cache_dir, then the context-insensitive ELMo vectors will be cached, making it much faster to load these vectors after the initial usage.

Example Usage:

word_emb=none/skip/elmo

python cliora/scripts/train.py
    --emb word_emb
    ... # other args

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

Related tags

Overview

CLIORA

Envs and Datas

Running CLIORA

Multi-GPU Training

Visualization

Word Embedding

License

Owner

Bo Wan

Systemic Evolutionary Chemical Space Exploration for Drug Discovery

PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems

Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Redash reset for python

Simple (but Strong) Baselines for POMDPs

PyTorch implementation for View-Guided Point Cloud Completion

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

MediaPipe Kullanarak İleri Seviye Bilgisayarla Görü

Its a Plant Leaf Disease Detection System based on Machine Learning.

Very Deep Convolutional Networks for Large-Scale Image Recognition

[NeurIPS 2021] Source code for the paper "Qu-ANTI-zation: Exploiting Neural Network Quantization for Achieving Adversarial Outcomes"

we propose EfficientDerain for high-efficiency single-image deraining

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

Multi-Stage Progressive Image Restoration

Conformer: Local Features Coupling Global Representations for Visual Recognition

Posterior predictive distributions quantify uncertainties ignored by point estimates.

PyTorch implementation for the paper Pseudo Numerical Methods for Diffusion Models on Manifolds

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

This is the dataset for testing the robustness of various VO/VIO methods