Official code repository for the EMNLP 2021 paper

Last update: Dec 19, 2022

Related tags

Overview

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

PyTorch code for the EMNLP 2021 paper "Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization". See the arxiv paper here.

Requirements:

This code has been tested on torch==1.11.0.dev20211014 (nightly) and torchvision==0.12.0.dev20211014 (nightly)

Prepare Repository:

Download the PororoSV dataset and associated files from here and save it as ./data. Download GloVe embeddings (glove.840B.300D) from here. The default location of the embeddings is ./data/ (see ./dcsgan/miscc/config.py).

Extract Constituency Parses:

To install the Berkeley Neural Parser with SpaCy:

pip install benepar

To extract parses for PororoSV:

python parse.py --dataset pororo --data_dir <path-to-data-directory>

Extract Dense Captions:

We use the Dense Captioning Model implementation available here. Download the pretrained model as outlined in their repository. To extract dense captions for PororoSV:
python describe_pororosv.py --config_json <path-to-config> --lut_path <path-to-VG-regions-dict-lite.pkl> --model_checkpoint <path-to-model-checkpoint> --img_path <path-to-data-directory> --box_per_img 10 --batch_size 1

Training VLC-StoryGAN:

To train VLC-StoryGAN for PororoSV:
python train_gan.py --cfg ./cfg/pororo_s1_vlc.yml --data_dir <path-to-data-directory> --dataset pororo\

Unless specified, the default output root directory for all model checkpoints is ./out/

Evaluation Models:

Please see here for evaluation models for character classification-based scores, BLEU2/3 and R-Precision.

To evaluate Frechet Inception Distance (FID):
python eval_vfid --img_ref_dir <path-to-image-directory-original images> --img_gen_dir <path-to-image-directory-generated-images> --mode <mode>

More details coming soon.

Citation:

@inproceedings{maharana2021integrating,
  title={Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization},
  author={Maharana, Adyasha and Bansal, Mohit},
  booktitle={EMNLP},
  year={2021}
}

Official code repository for the EMNLP 2021 paper

Related tags

Overview

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

Requirements:

Prepare Repository:

Extract Constituency Parses:

Extract Dense Captions:

Training VLC-StoryGAN:

Evaluation Models:

Citation:

Owner

Adyasha Maharana

PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

DeRF: Decomposed Radiance Fields

Cmsc11 arcade - Final Project for CMSC11

NeROIC: Neural Object Capture and Rendering from Online Image Collections

ML-PersonalWork - Big assignment PersonalWork in Machine Learning, 2021 autumn BUAA.

Unity Propagation in Bayesian Networks Handling Inconsistency via Unity Smoothing

Annotate with anyone, anywhere.

The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

Train DeepLab for Semantic Image Segmentation

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Simulation-based performance analysis of server-less Blockchain-enabled Federated Learning

Gluon CV Toolkit

Automatically replace ONNX's RandomNormal node with Constant node.

Repository of best practices for deep learning in Julia, inspired by fastai

SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

Read number plates with https://platerecognizer.com/

Contains modeling practice materials and homework for the Computational Neuroscience course at Okinawa Institute of Science and Technology

This is the official github repository of the Met dataset

TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)