GoodNews Everyone! Context driven entity aware captioning for news images

Related tags

Deep LearningGoodNews
Overview

This is the code for a CVPR 2019 paper, called GoodNews Everyone! Context driven entity aware captioning for news images. Enjoy!

Model preview:

GoodNews Model!

Huge Thanks goes to New York Times API for providing such a service for FREE!

Another Thanks to @ruotianluo for providing the captioning code.

Dependencies/Requirements:

pytorch==1.0.0
spacy==2.0.11
h5py==2.7.0
bs4==4.5.3
joblib==0.12.2
nltk==3.2.3
tqdm==4.19.5
urllib2==2.7
goose==1.0.25
urlparse
unidecode

Introduction

We took the first steps to move the captioning systems to interpretation (see the paper for more detail). To this end, we have used New York Times API to retrieve the articles, images and captions.

The structure of this repo is as follows:

  1. Getting the data
  2. Cleaning and formating the data
  3. How to train models

Get the data

You have 3 options to get the data.

Images only

If you want to download the images only and directly start working on the same dataset as ours, then download the cleaned version of the dataset without images: article+caption.json and put it to data/ folder and download the img_urls.json and put it in the get_data/get_images_only/ folder.

Then run

python get_images.py --num_thread 16

Then, you will get the images. After that move to Clean and Format Data section.

PS: I have recieved numerous emails regarding some of the images not present/broken in the img_urls.json. Which is why I decided to put the images on the drive to download in the name of open science. Download all images

Images + articles

If you would like the get the raw version of the article and captions to do your own cleaning and processing, no worries! First download the article_urls and go to folder get_data/with_article_urls/ and run

python get_data_with_urls.py --num_thread 16
python combine_dataset.py 

This will get you the raw version of the caption, articles and also the images. After that move to Clean and Format Data section.

I want more!

As you know, New York Times is huge. Their articles starts from 1881 (It is crazy!) until well today. So in case you want to get ALL the data or expand the data to more years, then first step is go to New York Times API and get an API key. All you have to do is just sign up for the API key.

Once you have the key go to folder get_data/with_api/ and run

python retrieve_all_urls.py --api-key XXXX --start_year XXX --end_year XXX 

This is for getting the article urls and then saving in the format of month-year. Once you have the all urls from the API, then you run

python get_data_api.py
python combine_dataset.py

get_data_api.py retrieves the articles, captions and images. combine_dataset.py combines yearly data into one file after removing data points if they have corrupt image, empty articles or empty captions. After that move to Clean and Format Data section.

Small Note

I also provide the links to images and their data splits (train, val, test). Even though I always use random seed to decide the split, just in case If the GODS meddles with the random seed, here is the link to a json where you can find each image and its split: img_splits.json

Clean and Format the Data

Now that we have the data, it is time to clean, preprocess and format the data.

Preprocess

When you reach this part, you must have captioning_dataset.json in your data/ folder.

Captions

This part is for cleaning the captions (tokenizing, removing non-ascii characters, etc.), splitting train, val, and test and creating anonymize captions.

In other words, we change the caption "Alber Einstein taught in Princeton in 1926" to "PERSON_ taught in ORGANIZATION_ in DATE_." Move to preprocess/ folder and run

python clean_captions.py

Resize Images

To resize the images to 256x256:

python resize.py --root XXXX --img_size 256

Articles

Get the article format that is needed for the encoding methods by running: create_article_set.py

python create_article_set.py

Format

Now to create H5 file for captions, images and articles, just need to go to scripts/ folder and run in order

python prepro_labels.py --max_length 31 --word_count_threshold 4
python prepro_images.py

We proposed 3 different article encoding method. You can download each of encoded article methods, articles_full_avg_, articles_full_wavg, articles_full_TBB.

Or you can use the code to obtain them:

python prepro_articles_avg.py
python prepro_articles_wavg.py
python prepro_articles_tbb.py

Train

Finally we are ready to train. Magical words are:

python train.py --cnn_weight [YOUR HOME DIRECTORY]/.torch/resnet152-b121ed2d.pth 

You can check the opt.py for changing a lot of the options such dimension size, different models, hyperparameters, etc.

Evaluate

After you train your models, you can get the score according commonly used metrics: Bleu, Cider, Spice, Rouge, Meteor. Be sure to specify model_path, cnn_model_path, infos_path and sen_embed_path when runing eval.py. eval.py is usually used in training but it is necessary to run it to get the insertion.

Insertion

Last but not least insert.py. After you run eval.py, it will produce you a json file with the ids and their template captions. To fill the correct named entity, you have to run insert.py:

python insert.py --output [XXX] --dump [True/False] --insertion_method ['ctx', 'att', 'rand']

PS: I have been requested to provide model's output, so I thought it would be best to share it with everyone. Model Output In this folder, you have:

test.json: Test set with raw and template version of the caption.

article.json: Article sentences which is needed in the insert.py.

w/o article folder: All the models output on template captions, without articles.

with article folder: Our models output in the paper with sentence attention(sen_att) and image attention(vis_att), provided in the json. Hope this is helpful to more of you.

Conclusion

Thank you and sorry for the bugs!

Simple cross-platform application for DaVinci surgical video frame annotation

About DaVid is a simple cross-platform GUI for annotating robotic and endoscopic surgical actions for use in deep-learning research. Features Simple a

Cyril Zakka 4 Oct 09, 2021
Imaging, analysis, and simulation software for radio interferometry

ehtim (eht-imaging) Python modules for simulating and manipulating VLBI data and producing images with regularized maximum likelihood methods. This ve

Andrew Chael 5.2k Dec 28, 2022
Embracing Single Stride 3D Object Detector with Sparse Transformer

SST: Single-stride Sparse Transformer This is the official implementation of paper: Embracing Single Stride 3D Object Detector with Sparse Transformer

TuSimple 385 Dec 28, 2022
RGB-stacking 🛑 🟩 🔷 for robotic manipulation

RGB-stacking 🛑 🟩 🔷 for robotic manipulation BLOG | PAPER | VIDEO Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes, Alex X. Lee*,

DeepMind 95 Dec 23, 2022
Super-Fast-Adversarial-Training - A PyTorch Implementation code for developing super fast adversarial training

Super-Fast-Adversarial-Training This is a PyTorch Implementation code for develo

LBK 26 Dec 02, 2022
Accurate identification of bacteriophages from metagenomic data using Transformer

PhaMer is a python library for identifying bacteriophages from metagenomic data. PhaMer is based on a Transorfer model and rely on protein-based vocab

Kenneth Shang 9 Nov 30, 2022
Distributed DataLoader For Pytorch Based On Ray

Dpex——用户无感知分布式数据预处理组件 一、前言 随着GPU与CPU的算力差距越来越大以及模型训练时的预处理Pipeline变得越来越复杂,CPU部分的数据预处理已经逐渐成为了模型训练的瓶颈所在,这导致单机的GPU配置的提升并不能带来期望的线性加速。预处理性能瓶颈的本质在于每个GPU能够使用的C

Dalong 23 Nov 02, 2022
Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Hiroshechka Y 33 Dec 26, 2022
LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation

LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation Table of Contents: Introduction Project Structure Installation Datas

Yu Wang 492 Dec 02, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8.1k Jan 06, 2023
Code for paper: Towards Tokenized Human Dynamics Representation

Video Tokneization Codebase for video tokenization, based on our paper Towards Tokenized Human Dynamics Representation. Prerequisites (tested under Py

Kenneth Li 20 May 31, 2022
Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

DD3D: "Is Pseudo-Lidar needed for Monocular 3D Object detection?" Install // Datasets // Experiments // Models // License // Reference Full video Offi

Toyota Research Institute - Machine Learning 364 Dec 27, 2022
The implementation of our CIKM 2021 paper titled as: "Cross-Market Product Recommendation"

FOREC: A Cross-Market Recommendation System This repository provides the implementation of our CIKM 2021 paper titled as "Cross-Market Product Recomme

Hamed Bonab 16 Sep 12, 2022
ULMFiT for Genomic Sequence Data

Genomic ULMFiT This is an implementation of ULMFiT for genomics classification using Pytorch and Fastai. The model architecture used is based on the A

Karl 276 Dec 12, 2022
Matlab Python Heuristic Battery Opt - SMOP conversion and manual conversion

SMOP is Small Matlab and Octave to Python compiler. SMOP translates matlab to py

Tom Xu 1 Jan 12, 2022
Scientific Computation Methods in C and Python (Open for Hacktoberfest 2021)

Sci - cpy README is a stub. Do expand it. Objective This repository is meant to be a ready reference for scientific computation methods. Do ⭐ it if yo

Sandip Dutta 7 Oct 12, 2022
Using fully convolutional networks for semantic segmentation with caffe for the cityscapes dataset

Using fully convolutional networks for semantic segmentation (Shelhamer et al.) with caffe for the cityscapes dataset How to get started Download the

Simon Guist 27 Jun 06, 2022
PyTorch implementation of EigenGAN

PyTorch Implementation of EigenGAN Train python train.py [image_folder_path] --name [experiment name] Test python test.py [ckpt path] --traverse FFH

62 Nov 12, 2022
Python Actor concurrency library

Thespian Actor Library This library provides the framework of an Actor model for use by applications implementing Actors. Thespian Site with Documenta

Kevin Quick 177 Dec 11, 2022
[ICCV'21] Pri3D: Can 3D Priors Help 2D Representation Learning?

Pri3D: Can 3D Priors Help 2D Representation Learning? [ICCV 2021] Pri3D leverages 3D priors for downstream 2D image understanding tasks: during pre-tr

Ji Hou 124 Jan 06, 2023