Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Overview

Diverse Image Captioning with Context-Object Split Latent Spaces

This repository is the PyTorch implementation of the paper:

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Shweta Mahajan and Stefan Roth

We additionally include evaluation code from Luo et al. in the folder GoogleConceptualCaptioning , which has been patched for compatibility.

Requirements

The following code is written in Python 3.6.10 and CUDA 9.0.

Requirements:

  • torch 1.1.0
  • torchvision 0.3.0
  • nltk 3.5
  • inflect 4.1.0
  • tqdm 4.46.0
  • sklearn 0.0
  • h5py 2.10.0

To install requirements:

conda config --add channels pytorch
conda config --add channels anaconda
conda config --add channels conda-forge
conda config --add channels conda-forge/label/cf202003
conda create -n <environment_name> --file requirements.txt
conda activate <environment_name>

Preprocessed data

The dataset used in this project for assessing accuracy and diversity is COCO 2014 (m-RNN split). The full dataset is available here.

We use the Faster R-CNN features for images similar to Anderson et al.. We additionally require "classes"/"scores" fields detected for image regions. The classes correspond to Visual Genome.

Download instructions

Preprocessed training data is available here as hdf5 files. The provided hdf5 files contain the following fields:

  • image_id: ID of the COCO image
  • num_boxes: The proposal regions detected from Faster R-CNN
  • features: ResNet-101 features of the extracted regions
  • classes: Visual genome classes of the extracted regions
  • scores: Scores of the Visual genome classes of the extracted regions

Note that the ["image_id","num_boxes","features"] fields are identical to Anderson et al.

Create a folder named coco and download the preprocessed training and test datasets from the coco folder in the drive link above as follows (it is also possible to directly download the entire coco folder from the drive link):

  1. Download the following files for training on COCO 2014 (m-RNN split):
coco/coco_train_2014_adaptive_withclasses.h5
coco/coco_val_2014_adaptive_withclasses.h5
coco/coco_val_mRNN.txt
coco/coco_test_mRNN.txt
  1. Download the following files for training on held-out COCO (novel object captioning):
coco/coco_train_2014_noc_adaptive_withclasses.h5
coco/coco_train_extra_2014_noc_adaptive_withclasses.h5
  1. Download the following files for testing on held-out COCO (novel object captioning):
coco/coco_test_2014_noc_adaptive_withclasses.h5
  1. Download the (caption) annotation files and place them in a subdirectory coco/annotations (mirroring the Google drive folder structure)
coco/annotations/captions_train2014.json
coco/annotations/captions_val2014.json
  1. Download the following files from the drive link in a seperate folder data (outside coco). These files contain the contextual neighbours for pseudo supervision:
data/nn_final.pkl
data/nn_noc.pkl

For running the train/test scripts (described in the following) "pathToData"/"nn_dict_path" in params.json and params_noc.json needs to be set to the coco/data folder created above.

Verify Folder Structure after Download

The folder structure of coco after data download should be as follows,

coco
 - annotations
   - captions_train2014.json
   - captions_val2014.json
 - coco_val_mRNN.txt
 - coco_test_mRNN.txt
 - coco_train_2014_adaptive_withclasses.h5
 - coco_val_2014_adaptive_withclasses.h5
 - coco_train_2014_noc_adaptive_withclasses.h5
 - coco_train_extra_2014_noc_adaptive_withclasses.h5
 - coco_test_2014_noc_adaptive_withclasses.h5
data
 - coco_classname.txt
 - visual_genome_classes.txt
 - vocab_coco_full.pkl
 - nn_final.pkl
 - nn_noc.pkl

Training

Please follow the following instructions for training:

  1. Set hyperparameters for training in params.json and params_noc.json.
  2. Train a model on COCO 2014 for captioning,
   	python ./scripts/train.py
  1. Train a model for diverse novel object captioning,
   	python ./scripts/train_noc.py

Please note that the data folder provides the required vocabulary.

Memory requirements

The models were trained on a single nvidia V100 GPU with 32 GB memory. 16 GB is sufficient for training a single run.

Pre-trained models and evaluation

We provide pre-trained models for both captioning on COCO 2014 (mRNN split) and novel object captioning. Please follow the following steps:

  1. Download the pre-trained models from here to the ckpts folder.

  2. For evaluation of oracle scores and diversity, we follow Luo et al.. In the folder GoogleConceptualCaptioning download the cider and in the cococaption folder run the download scripts,

   	./GoogleConceptualCaptioning/cococaption/get_google_word2vec_model.sh
   	./GoogleConceptualCaptioning/cococaption/get_stanford_models.sh
   	python ./scripts/eval.py
  1. For diversity evaluation create the required numpy file for consensus re-ranking using,
   	python ./scripts/eval_diversity.py

For consensus re-ranking follow the steps here. To obtain the final diversity scores, follow the instructions of DiversityMetrics. Convert the numpy file to required json format and run the script evalscripts.py

  1. To evaluate the F1 score for novel object captioning,
   	python ./scripts/eval_noc.py

Results

Oracle evaluation on the COCO dataset

B4 B3 B2 B1 CIDEr METEOR ROUGE SPICE
COS-CVAE 0.633 0.739 0.842 0.942 1.893 0.450 0.770 0.339

Diversity evaluation on the COCO dataset

Unique Novel mBLEU Div-1 Div-2
COS-CVAE 96.3 4404 0.53 0.39 0.57

F1-score evaluation on the held-out COCO dataset

bottle bus couch microwave pizza racket suitcase zebra average
COS-CVAE 35.4 83.6 53.8 63.2 86.7 69.5 46.1 81.7 65.0

Bibtex

@inproceedings{coscvae20neurips,
  title     = {Diverse Image Captioning with Context-Object Split Latent Spaces},
  author    = {Mahajan, Shweta and Roth, Stefan},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year = {2020}
}
Owner
Visual Inference Lab @TU Darmstadt
Visual Inference Lab @TU Darmstadt
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities.

Playground for CLIP-like models Demo Colab Link GradCAM Visualization Naive Zero-shot Detection Smarter Zero-shot Detection Captcha Solver Changelog 2

Kevin Zakka 101 Dec 30, 2022
Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

causal-bald | Abstract | Installation | Example | Citation | Reproducing Results DUE An implementation of the methods presented in Causal-BALD: Deep B

OATML 13 Oct 07, 2022
Flybirds - BDD-driven natural language automated testing framework, present by Trip Flight

Flybird | English Version 行为驱动开发(Behavior-driven development,缩写BDD),是一种软件过程的思想或者

Ctrip, Inc. 706 Dec 30, 2022
Code for "Learning to Regrasp by Learning to Place"

Learning2Regrasp Learning to Regrasp by Learning to Place, CoRL 2021. Introduction We propose a point-cloud-based system for robots to predict a seque

Shuo Cheng (成硕) 18 Aug 27, 2022
TakeInfoatNistforICS - Take Information in NIST NVD for ICS

Take Information in NIST NVD for ICS This project developed with Python. When yo

5 Sep 05, 2022
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

Steven G. Johnson 1.4k Dec 25, 2022
Pyramid addon for OpenAPI3 validation of requests and responses.

Validate Pyramid views against an OpenAPI 3.0 document Peace of Mind The reason this package exists is to give you peace of mind when providing a REST

Pylons Project 79 Dec 30, 2022
《Single Image Reflection Removal Beyond Linearity》(CVPR 2019)

Single-Image-Reflection-Removal-Beyond-Linearity Paper Single Image Reflection Removal Beyond Linearity. Qiang Wen, Yinjie Tan, Jing Qin, Wenxi Liu, G

Qiang Wen 51 Jun 24, 2022
Official PyTorch implementation of "Evolving Search Space for Neural Architecture Search"

Evolving Search Space for Neural Architecture Search Usage Install all required dependencies in requirements.txt and replace all ..path/..to in the co

Yuanzheng Ci 10 Oct 24, 2022
Adaptive Graph Convolution for Point Cloud Analysis

Adaptive Graph Convolution for Point Cloud Analysis This repository contains the implementation of AdaptConv for point cloud analysis. Adaptive Graph

64 Dec 21, 2022
Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases.

Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases. Ivy wraps the functional APIs of existing frameworks. Framework-agnostic functions, libraries an

Ivy 8.2k Jan 02, 2023
A non-linear, non-parametric Machine Learning method capable of modeling complex datasets

Fast Symbolic Regression Symbolic Regression is a non-linear, non-parametric Machine Learning method capable of modeling complex data sets. fastsr aim

VAMSHI CHOWDARY 3 Jun 22, 2022
CTC segmentation python package

CTC segmentation CTC segmentation can be used to find utterances alignments within large audio files. This repository contains the ctc-segmentation py

Ludwig Kürzinger 217 Jan 04, 2023
University of Rochester 2021 Summer REU focusing on music sentiment transfer using CycleGAN

Music-Sentiment-Transfer University of Rochester 2021 Summer REU focusing on music sentiment transfer using CycleGAN Poster: Music Sentiment Transfer

Miles Sigel 2 Jan 24, 2022
Implementation of popular bandit algorithms in batch environments.

batch-bandits Implementation of popular bandit algorithms in batch environments. Source code to our paper "The Impact of Batch Learning in Stochastic

Danil Provodin 2 Sep 11, 2022
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Multipath RefineNet A MATLAB based framework for semantic image segmentation and general dense prediction tasks on images. This is the source code for

Guosheng Lin 575 Dec 06, 2022
(CVPR 2022) Pytorch implementation of "Self-supervised transformers for unsupervised object discovery using normalized cut"

(CVPR 2022) TokenCut Pytorch implementation of Tokencut: Self-supervised Transformers for Unsupervised Object Discovery using Normalized Cut Yangtao W

YANGTAO WANG 200 Jan 02, 2023
An Open-Source Tool for Automatic Disease Diagnosis..

OpenMedicalChatbox An Open-Source Package for Automatic Disease Diagnosis. Overview Due to the lack of open source for existing RL-base automated diag

8 Nov 08, 2022
[NeurIPS2021] Code Release of K-Net: Towards Unified Image Segmentation

K-Net: Towards Unified Image Segmentation Introduction This is an official release of the paper K-Net:Towards Unified Image Segmentation. K-Net will a

Wenwei Zhang 423 Jan 02, 2023