Official code repository for the EMNLP 2021 paper

Last update: Dec 19, 2022

Related tags

Overview

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

PyTorch code for the EMNLP 2021 paper "Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization". See the arxiv paper here.

Requirements:

This code has been tested on torch==1.11.0.dev20211014 (nightly) and torchvision==0.12.0.dev20211014 (nightly)

Prepare Repository:

Download the PororoSV dataset and associated files from here and save it as ./data. Download GloVe embeddings (glove.840B.300D) from here. The default location of the embeddings is ./data/ (see ./dcsgan/miscc/config.py).

Extract Constituency Parses:

To install the Berkeley Neural Parser with SpaCy:

pip install benepar

To extract parses for PororoSV:

python parse.py --dataset pororo --data_dir <path-to-data-directory>

Extract Dense Captions:

We use the Dense Captioning Model implementation available here. Download the pretrained model as outlined in their repository. To extract dense captions for PororoSV:
python describe_pororosv.py --config_json <path-to-config> --lut_path <path-to-VG-regions-dict-lite.pkl> --model_checkpoint <path-to-model-checkpoint> --img_path <path-to-data-directory> --box_per_img 10 --batch_size 1

Training VLC-StoryGAN:

To train VLC-StoryGAN for PororoSV:
python train_gan.py --cfg ./cfg/pororo_s1_vlc.yml --data_dir <path-to-data-directory> --dataset pororo\

Unless specified, the default output root directory for all model checkpoints is ./out/

Evaluation Models:

Please see here for evaluation models for character classification-based scores, BLEU2/3 and R-Precision.

To evaluate Frechet Inception Distance (FID):
python eval_vfid --img_ref_dir <path-to-image-directory-original images> --img_gen_dir <path-to-image-directory-generated-images> --mode <mode>

More details coming soon.

Citation:

@inproceedings{maharana2021integrating,
  title={Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization},
  author={Maharana, Adyasha and Bansal, Mohit},
  booktitle={EMNLP},
  year={2021}
}

Official code repository for the EMNLP 2021 paper

Related tags

Overview

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

Requirements:

Prepare Repository:

Extract Constituency Parses:

Extract Dense Captions:

Training VLC-StoryGAN:

Evaluation Models:

Citation:

Owner

Adyasha Maharana

The code used for the free [email protected] Webinar series on Reinforcement Learning in Finance

DumpSMBShare - A script to dump files and folders remotely from a Windows SMB share

Official implementation of NeurIPS 2021 paper "Contextual Similarity Aggregation with Self-attention for Visual Re-ranking"

Cosine Annealing With Warmup

A Light CNN for Deep Face Representation with Noisy Labels

A simple implementation of Kalman filter in Multi Object Tracking

Official code for the paper "Self-Supervised Prototypical Transfer Learning for Few-Shot Classification"

Phy-Q: A Benchmark for Physical Reasoning

[CVPR 2021] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Predicting Student Attentiveness using OpenCV

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

A Web API for automatic background removal using Deep Learning. App is made using Flask and deployed on Heroku.

Official implementation of "MetaSDF: Meta-learning Signed Distance Functions"

This is a collection of our NAS and Vision Transformer work.

A lightweight deep network for fast and accurate optical flow estimation.

SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

A collection of inference modules for fastai2

A PyTorch-centric hybrid classical-quantum machine learning framework

The pytorch implementation of SOKD (BMVC2021).