Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020

Last update: Nov 25, 2022

Related tags

Testing show-edit-tell

Overview

Show, Edit and Tell: A Framework for Editing Image Captions | arXiv

This contains the source code for Show, Edit and Tell: A Framework for Editing Image Captions, to appear at CVPR 2020

Requirements

Python 3.6 or 3.7
PyTorch 1.2

For evaluation, you also need:

Argument Parser is currently not supported. We will add support to it soon.

Pretrained Models

You can download the pretrained models from here. Place them in eval folder.

Download and Prepare Features

In this work, we use 36 fixed bottom-up features. If you wish to use the adaptive features (10-100), please refer to adaptive_features folder in this repository and follow the instructions.

First, download the fixed features from here and unzip the file. Place the unzipped folder in bottom-up_features folder.

Next type this command:

python bottom-up_features/tsv.py

This command will create the following files:

An HDF5 file containing the bottom up image features for train and val splits, 36 per image for each split, in an (I, 36, 2048) tensor where I is the number of images in the split.
PKL files that contain training and validation image IDs mapping to index in HDF5 dataset created above.

Download/Prepare Caption Data

You can either download all the related caption data files from here or create them yourself. The folder contains the following:

WORDMAP_coco: maps the words to indices
CAPUTIL: stores the information about the existing captions in a dictionary organized as follows: {"COCO_image_name": {"caption": "existing caption to be edited", "encoded_previous_caption": an encoded list of the words, "previous_caption_length": a list contaning the length of the caption, "image_ids": the COCO image id}
CAPTIONS the encoded ground-truth captions (a list with number_images x 5 lists. Example: we have 113,287 training images in Karpathy Split, thereofre there is 566,435 lists for the training split)
CAPLENS: the length of the ground-truth captions (a list with number_images x 5 vallues)
NAMES: the COCO image name in the same order as the CAPTIONS
GENOME_DETS: the splits and image ids for loading the images in accordance to the features file created above

If you'd like to create the caption data yourself, download Karpathy's Split training, validation, and test splits. This zip file contains the captions. Place the file in caption data folder. You should also have the pkl files created from the 'Download Features' section: train36_imgid2idx.pkl and val36_imgid2idx.pkl.

Next, run:

python preprocess_caps.py

This will dump all the files to the folder caption data.

Next, download the existing captios to be edited, and organize them in a list containing dictionaries with each dictionary in the following format: {"image_id": COCO_image_id, "caption": "caption to be edited", "file_name": "split\\COCO_image_name"}. For example: {"image_id": 522418, "caption": "a woman cutting a cake with a knife", "file_name": "val2014\\COCO_val2014_000000522418.jpg"}. In our work, we use the captions produced by AoANet.

Next, run:

python preprocess_existing_caps.py

This will dump all the existing caption files to the folder caption data.

Prepare/Download Sequence-Level Training Data

Download the RL-data for sequence-level training used for computing metric scores from here.

Alternitavely, you may prepare the data yourself:

Run the following command:

python preprocess_rl.py

This will dump two files in the data folder used for computing metric scores.

Training and Validation

XE training stage:

For training DCNet, run:

python dcnet.py

For optimizing DCNet with MSE, run:

python dcnet_with_mse.py

For training editnet:

python editnet.py

Cider-D Optimization stage:

For training DCNet, run:

python dcnet_rl.py

For training editnet:

python editnet_rl.py

Evaluation

Refer to eval folder for instructions. All the generated captions and scores from our model can be found in the outputs folder.

	BLEU-1	BLEU-4	CIDEr	SPICE
Cross-Entropy Loss	77.9	38.0	1.200	21.2
CIDEr Optimization	80.6	39.2	1.289	22.6

Citation

@InProceedings{Sammani_2020_CVPR,
author = {Sammani, Fawaz and Melas-Kyriazi, Luke},
title = {Show, Edit and Tell: A Framework for Editing Image Captions},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

References

Our code is mainly based on self-critical and show attend and tell. We thank both authors.

Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020

Related tags

Overview

Show, Edit and Tell: A Framework for Editing Image Captions | arXiv

Requirements

Pretrained Models

Download and Prepare Features

Download/Prepare Caption Data

Prepare/Download Sequence-Level Training Data

Training and Validation

XE training stage:

Cider-D Optimization stage:

Evaluation

Citation

References

Owner

Fawaz Sammani

A simple serverless create api test repository. Please Ignore.

Fully functioning price detector built with selenium and python

Test utility for validating OpenAPI documentation

Bayesian A/B testing

Lightweight, scriptable browser as a service with an HTTP API

HTTP traffic mocking and testing made easy in Python

Network automation lab using nornir, scrapli, and containerlab with Arista EOS

Generate random test credit card numbers for testing, validation and/or verification purposes.

To automate the generation and validation tests of COSE/CBOR Codes and it's base45/2D Code representations

A tool to auto generate the basic mocks and asserts for faster unit testing

Python dilinin Selenium kütüphanesini kullanarak; Amazon, LinkedIn ve ÇiçekSepeti üzerinde test işlemleri yaptığımız bir case study reposudur.

🏃💨 For when you need to fill out feedback in the last minute.

Automated Security Testing For REST API's

Voip Open Linear Testing Suite

A command-line tool and Python library and Pytest plugin for automated testing of RESTful APIs, with a simple, concise and flexible YAML-based syntax

An Instagram bot that can mass text users, receive and read a text, and store it somewhere with user details.

A Demo of Feishu automation testing framework

A python bot using the Selenium library to auto-buy specified sneakers on the nike.com website.

PyAutoEasy is a extension / wrapper around the famous PyAutoGUI, a cross-platform GUI automation tool to replace your boooring repetitive tasks.