Official code repository for the EMNLP 2021 paper

Last update: Dec 19, 2022

Related tags

Overview

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

PyTorch code for the EMNLP 2021 paper "Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization". See the arxiv paper here.

Requirements:

This code has been tested on torch==1.11.0.dev20211014 (nightly) and torchvision==0.12.0.dev20211014 (nightly)

Prepare Repository:

Download the PororoSV dataset and associated files from here and save it as ./data. Download GloVe embeddings (glove.840B.300D) from here. The default location of the embeddings is ./data/ (see ./dcsgan/miscc/config.py).

Extract Constituency Parses:

To install the Berkeley Neural Parser with SpaCy:

pip install benepar

To extract parses for PororoSV:

python parse.py --dataset pororo --data_dir <path-to-data-directory>

Extract Dense Captions:

We use the Dense Captioning Model implementation available here. Download the pretrained model as outlined in their repository. To extract dense captions for PororoSV:
python describe_pororosv.py --config_json <path-to-config> --lut_path <path-to-VG-regions-dict-lite.pkl> --model_checkpoint <path-to-model-checkpoint> --img_path <path-to-data-directory> --box_per_img 10 --batch_size 1

Training VLC-StoryGAN:

To train VLC-StoryGAN for PororoSV:
python train_gan.py --cfg ./cfg/pororo_s1_vlc.yml --data_dir <path-to-data-directory> --dataset pororo\

Unless specified, the default output root directory for all model checkpoints is ./out/

Evaluation Models:

Please see here for evaluation models for character classification-based scores, BLEU2/3 and R-Precision.

To evaluate Frechet Inception Distance (FID):
python eval_vfid --img_ref_dir <path-to-image-directory-original images> --img_gen_dir <path-to-image-directory-generated-images> --mode <mode>

More details coming soon.

Citation:

@inproceedings{maharana2021integrating,
  title={Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization},
  author={Maharana, Adyasha and Bansal, Mohit},
  booktitle={EMNLP},
  year={2021}
}

Official code repository for the EMNLP 2021 paper

Related tags

Overview

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

Requirements:

Prepare Repository:

Extract Constituency Parses:

Extract Dense Captions:

Training VLC-StoryGAN:

Evaluation Models:

Citation:

Owner

Adyasha Maharana

Official PyTorch Implementation of HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning (NeurIPS 2021 Spotlight)

Includes PyTorch -> Keras model porting code for ConvNeXt family of models with fine-tuning and inference notebooks.

Meta graph convolutional neural network-assisted resilient swarm communications

Code of the paper "Deep Human Dynamics Prior" in ACM MM 2021.

Deep Multi-Magnification Network for multi-class tissue segmentation of whole slide images

FSL-Mate: A collection of resources for few-shot learning (FSL).

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch

This repository contains the database and code used in the paper Embedding Arithmetic for Text-driven Image Transformation

Developed an optimized algorithm which finds the most optimal path between 2 points in a 3D Maze using various AI search techniques like BFS, DFS, UCS, Greedy BFS and A*

Angular & Electron desktop UI framework. Angular components for native looking and behaving macOS desktop UI (Electron/Web)

f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

The toolkit to generate auto labeled datasets

Post-training Quantization for Neural Networks with Provable Guarantees

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

Vector Quantization, in Pytorch

A curated list of awesome resources combining Transformers with Neural Architecture Search

This project implements "virtual speed" from heart rate monito

Ensemble Visual-Inertial Odometry (EnVIO)

Code for Efficient Visual Pretraining with Contrastive Detection