Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Overview

Diverse Image Captioning with Context-Object Split Latent Spaces

This repository is the PyTorch implementation of the paper:

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Shweta Mahajan and Stefan Roth

We additionally include evaluation code from Luo et al. in the folder GoogleConceptualCaptioning , which has been patched for compatibility.

Requirements

The following code is written in Python 3.6.10 and CUDA 9.0.

Requirements:

  • torch 1.1.0
  • torchvision 0.3.0
  • nltk 3.5
  • inflect 4.1.0
  • tqdm 4.46.0
  • sklearn 0.0
  • h5py 2.10.0

To install requirements:

conda config --add channels pytorch
conda config --add channels anaconda
conda config --add channels conda-forge
conda config --add channels conda-forge/label/cf202003
conda create -n <environment_name> --file requirements.txt
conda activate <environment_name>

Preprocessed data

The dataset used in this project for assessing accuracy and diversity is COCO 2014 (m-RNN split). The full dataset is available here.

We use the Faster R-CNN features for images similar to Anderson et al.. We additionally require "classes"/"scores" fields detected for image regions. The classes correspond to Visual Genome.

Download instructions

Preprocessed training data is available here as hdf5 files. The provided hdf5 files contain the following fields:

  • image_id: ID of the COCO image
  • num_boxes: The proposal regions detected from Faster R-CNN
  • features: ResNet-101 features of the extracted regions
  • classes: Visual genome classes of the extracted regions
  • scores: Scores of the Visual genome classes of the extracted regions

Note that the ["image_id","num_boxes","features"] fields are identical to Anderson et al.

Create a folder named coco and download the preprocessed training and test datasets from the coco folder in the drive link above as follows (it is also possible to directly download the entire coco folder from the drive link):

  1. Download the following files for training on COCO 2014 (m-RNN split):
coco/coco_train_2014_adaptive_withclasses.h5
coco/coco_val_2014_adaptive_withclasses.h5
coco/coco_val_mRNN.txt
coco/coco_test_mRNN.txt
  1. Download the following files for training on held-out COCO (novel object captioning):
coco/coco_train_2014_noc_adaptive_withclasses.h5
coco/coco_train_extra_2014_noc_adaptive_withclasses.h5
  1. Download the following files for testing on held-out COCO (novel object captioning):
coco/coco_test_2014_noc_adaptive_withclasses.h5
  1. Download the (caption) annotation files and place them in a subdirectory coco/annotations (mirroring the Google drive folder structure)
coco/annotations/captions_train2014.json
coco/annotations/captions_val2014.json
  1. Download the following files from the drive link in a seperate folder data (outside coco). These files contain the contextual neighbours for pseudo supervision:
data/nn_final.pkl
data/nn_noc.pkl

For running the train/test scripts (described in the following) "pathToData"/"nn_dict_path" in params.json and params_noc.json needs to be set to the coco/data folder created above.

Verify Folder Structure after Download

The folder structure of coco after data download should be as follows,

coco
 - annotations
   - captions_train2014.json
   - captions_val2014.json
 - coco_val_mRNN.txt
 - coco_test_mRNN.txt
 - coco_train_2014_adaptive_withclasses.h5
 - coco_val_2014_adaptive_withclasses.h5
 - coco_train_2014_noc_adaptive_withclasses.h5
 - coco_train_extra_2014_noc_adaptive_withclasses.h5
 - coco_test_2014_noc_adaptive_withclasses.h5
data
 - coco_classname.txt
 - visual_genome_classes.txt
 - vocab_coco_full.pkl
 - nn_final.pkl
 - nn_noc.pkl

Training

Please follow the following instructions for training:

  1. Set hyperparameters for training in params.json and params_noc.json.
  2. Train a model on COCO 2014 for captioning,
   	python ./scripts/train.py
  1. Train a model for diverse novel object captioning,
   	python ./scripts/train_noc.py

Please note that the data folder provides the required vocabulary.

Memory requirements

The models were trained on a single nvidia V100 GPU with 32 GB memory. 16 GB is sufficient for training a single run.

Pre-trained models and evaluation

We provide pre-trained models for both captioning on COCO 2014 (mRNN split) and novel object captioning. Please follow the following steps:

  1. Download the pre-trained models from here to the ckpts folder.

  2. For evaluation of oracle scores and diversity, we follow Luo et al.. In the folder GoogleConceptualCaptioning download the cider and in the cococaption folder run the download scripts,

   	./GoogleConceptualCaptioning/cococaption/get_google_word2vec_model.sh
   	./GoogleConceptualCaptioning/cococaption/get_stanford_models.sh
   	python ./scripts/eval.py
  1. For diversity evaluation create the required numpy file for consensus re-ranking using,
   	python ./scripts/eval_diversity.py

For consensus re-ranking follow the steps here. To obtain the final diversity scores, follow the instructions of DiversityMetrics. Convert the numpy file to required json format and run the script evalscripts.py

  1. To evaluate the F1 score for novel object captioning,
   	python ./scripts/eval_noc.py

Results

Oracle evaluation on the COCO dataset

B4 B3 B2 B1 CIDEr METEOR ROUGE SPICE
COS-CVAE 0.633 0.739 0.842 0.942 1.893 0.450 0.770 0.339

Diversity evaluation on the COCO dataset

Unique Novel mBLEU Div-1 Div-2
COS-CVAE 96.3 4404 0.53 0.39 0.57

F1-score evaluation on the held-out COCO dataset

bottle bus couch microwave pizza racket suitcase zebra average
COS-CVAE 35.4 83.6 53.8 63.2 86.7 69.5 46.1 81.7 65.0

Bibtex

@inproceedings{coscvae20neurips,
  title     = {Diverse Image Captioning with Context-Object Split Latent Spaces},
  author    = {Mahajan, Shweta and Roth, Stefan},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year = {2020}
}
Owner
Visual Inference Lab @TU Darmstadt
Visual Inference Lab @TU Darmstadt
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Adam Goodge 25 Dec 28, 2022
Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks"

HKD Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks" cifia-100 result The implementation of compared methods are ba

Wang Yucheng 30 Dec 18, 2022
La source de mon module 'pyfade' disponible sur Pypi.

Version: 1.2 Introduction Pyfade est un module permettant de créer des dégradés colorés. Il vous permettra de changer chaque ligne de votre texte par

Billy 20 Sep 12, 2021
Self-Supervised Collision Handling via Generative 3D Garment Models for Virtual Try-On

Self-Supervised Collision Handling via Generative 3D Garment Models for Virtual Try-On [Project website] [Dataset] [Video] Abstract We propose a new g

71 Dec 24, 2022
Deep Inertial Prediction (DIPr)

Deep Inertial Prediction For more information and context related to this repo, please refer to our website. Getting Started (non Docker) Note: you wi

Arcturus Industries 12 Nov 11, 2022
New AidForBlind - Various Libraries used like OpenCV and other mentioned in Requirements.txt

AidForBlind Recommended PyCharm IDE Various Libraries used like OpenCV and other

Aalhad Chandewar 1 Jan 13, 2022
Distributional Sliced-Wasserstein distance code

Distributional Sliced Wasserstein distance This is a pytorch implementation of the paper "Distributional Sliced-Wasserstein and Applications to Genera

VinAI Research 39 Jan 01, 2023
Pytorch implementation of OCNet series and SegFix.

openseg.pytorch News 2021/09/14 MMSegmentation has supported our ISANet and refer to ISANet for more details. 2021/08/13 We have released the implemen

openseg-group 1.1k Dec 23, 2022
Unofficial implementation (replicates paper results!) of MINER: Multiscale Implicit Neural Representations in pytorch-lightning

MINER_pl Unofficial implementation of MINER: Multiscale Implicit Neural Representations in pytorch-lightning. 📖 Ref readings Laplacian pyramid explan

AI葵 51 Nov 28, 2022
Social Network Ads Prediction

Social network advertising, also social media targeting, is a group of terms that are used to describe forms of online advertising that focus on social networking services.

Khazar 2 Jan 28, 2022
DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control One version of our system is implemented using the

260 Nov 28, 2022
The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection Updates | Introduction | Results | Usage | Citation |

33 Jan 05, 2023
Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

ADE20k Semantic segmentation with MAE Getting started Install the mmsegmentation

97 Dec 17, 2022
A PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-Supervised Learning Framework".

Mugs: A Multi-Granular Self-Supervised Learning Framework This is a PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-

Sea AI Lab 62 Nov 08, 2022
In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard test set accuracy

PixMix Introduction In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard te

Andy Zou 79 Dec 30, 2022
PyTorch implementation HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

HoroPCA This code is the official PyTorch implementation of the ICML 2021 paper: HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projec

HazyResearch 52 Nov 14, 2022
labelpix is a graphical image labeling interface for drawing bounding boxes

Welcome to labelpix 👋 labelpix is a graphical image labeling interface for drawing bounding boxes. 🏠 Homepage Install pip install -r requirements.tx

schissmantics 26 May 24, 2022
Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

78 Dec 27, 2022
[ECCV'20] Convolutional Occupancy Networks

Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page | Blog Post This repository contains the implementation o

622 Dec 30, 2022
Classical OCR DCNN reproduction based on PaddlePaddle framework.

Paddle-SVHN Classical OCR DCNN reproduction based on PaddlePaddle framework. This project reproduces Multi-digit Number Recognition from Street View I

1 Nov 12, 2021