Official Implementation of Few-shot Visual Relationship Co-localization

Last update: Oct 13, 2022

Related tags

Deep Learning VRC

Overview

VRC

Official implementation of the Few-shot Visual Relationship Co-localization (ICCV 2021) paper

project page | paper

Requirements

Use python >= 3.8.5. Conda recommended : https://docs.anaconda.com/anaconda/install/linux/
Use pytorch 1.7.0 CUDA 10.2
Other requirements from 'requirements.txt'

To setup environment

# create new env vrc
$ conda create -n vrc python=3.8.5

# activate vrc
$ conda activate vrc

# install pytorch, torchvision
$ conda install pytorch==1.7.0 torchvision==0.8.0 cudatoolkit=10.2 -c pytorch

# install other dependencies
$ pip install -r requirements.txt

Training

Preparing dataset

Download VG images from https://visualgenome.org/
Extract faster_rcnn features of VG images using data_preparation/vrc_extract_frcnn_feats.py. Please follow instructions here.
Download VrR-VG dataset from http://vrr-vg.com/ or Google Drive Link

Training VR Encoder (VTransE)

Training parameters

To check and update training, model and dataset parameters see VR_Encoder/configs

To train VR Encoder:

$ python train_vr_encoder.py

Training VR Similarity Network (Relation Network)

Training parameters

To check and update training, testing, model and dataset parameters see VR_SimilarityNetwork/configs

To train VR Similarity Network:

$ python SimilarityNetworkTrain.py

To train VR Similarity Network (w/ concat as VR Encoding):

$ python ConcatplusSimilarityNetworkTrain.py

To evaluate (set eval setting in test_config.yaml)

$ python FullModelTest.py

Cite

If you find this code/paper useful for your research, please consider citing.

@InProceedings{teotiaMMM2021,
  author    = "Teotia, Revant and Mishra, Vaibhav and Maheshwari, Mayank and Mishra, Anand",
  title     = "Few-shot Visual Relationship Co-Localization",
  booktitle = "ICCV",
  year      = "2021",
}

Acknowledgements

This repo uses https://gitlab.com/meetshah1995/vqa-maskrcnn-benchmark and scripts from https://github.com/facebookresearch/mmf for Faster R-CNN feature extraction.

Code provided by https://github.com/zawlin/cvpr17_vtranse and https://github.com/yangxuntu/vrd helped in implementing VR encoder.

Contact

For any clarification, comment, or suggestion please create an issue or contact Revant, Vaibhav or Mayank.

Official Implementation of Few-shot Visual Relationship Co-localization

Related tags

Overview

VRC

Requirements

Training

Preparing dataset

Training VR Encoder (VTransE)

Training parameters

To train VR Encoder:

Training VR Similarity Network (Relation Network)

Training parameters

To train VR Similarity Network:

To train VR Similarity Network (w/ concat as VR Encoding):

To evaluate (set eval setting in test_config.yaml)

Cite

Acknowledgements

Contact

Owner

Code to produce syntactic representations that can be used to study syntax processing in the human brain

Framework for abstracting Amiga debuggers and access to AmigaOS libraries and devices.

Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)

QICK: Quantum Instrumentation Control Kit

Code for paper: "Spinning Language Models for Propaganda-As-A-Service"

Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis

Vrcwatch - Supply the local time to VRChat as Avatar Parameters through OSC

Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image (ICCV 2021)

PyTorch deep learning projects made easy.

[CVPR 2021] "Multimodal Motion Prediction with Stacked Transformers": official code implementation and project page.

Stream images from a connected camera over MQTT, view using Streamlit, record to file and sqlite

HODEmu, is both an executable and a python library that is based on Ragagnin 2021 in prep.

Only valid pull requests will be allowed. Use python only and readme changes will not be accepted.

Pytorch reimplementation of PSM-Net: "Pyramid Stereo Matching Network"

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

VIsually-Pivoted Audio and(N) Text

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

Extending JAX with custom C++ and CUDA code

Hierarchical User Intent Graph Network for Multimedia Recommendation

Source code for our paper "Do Not Trust Prediction Scores for Membership Inference Attacks"