Unsupervised captioning - Code for Unsupervised Image Captioning

Overview

Unsupervised Image Captioning

by Yang Feng, Lin Ma, Wei Liu, and Jiebo Luo

Introduction

Most image captioning models are trained using paired image-sentence data, which are expensive to collect. We propose unsupervised image captioning to relax the reliance on paired data. For more details, please refer to our paper.

alt text

Citation

@InProceedings{feng2019unsupervised,
  author = {Feng, Yang and Ma, Lin and Liu, Wei and Luo, Jiebo},
  title = {Unsupervised Image Captioning},
  booktitle = {CVPR},
  year = {2019}
}

Requirements

mkdir ~/workspace
cd ~/workspace
git clone https://github.com/tensorflow/models.git tf_models
git clone https://github.com/tylin/coco-caption.git
touch tf_models/research/im2txt/im2txt/__init__.py
touch tf_models/research/im2txt/im2txt/data/__init__.py
touch tf_models/research/im2txt/im2txt/inference_utils/__init__.py
wget http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz
mkdir ckpt
tar zxvf inception_v4_2016_09_09.tar.gz -C ckpt
git clone https://github.com/fengyang0317/unsupervised_captioning.git
cd unsupervised_captioning
pip install -r requirements.txt
export PYTHONPATH=$PYTHONPATH:`pwd`

Dataset (Optional. The files generated below can be found at Gdrive).

In case you do not have the access to Google, the files are also available at One Drive.

  1. Crawl image descriptions. The descriptions used when conducting the experiments in the paper are available at link. You may download the descriptions from the link and extract the files to data/coco.

    pip3 install absl-py
    python3 preprocessing/crawl_descriptions.py
    
  2. Extract the descriptions. It seems that NLTK is changing constantly. So the number of the descriptions obtained may be different.

    python -c "import nltk; nltk.download('punkt')"
    python preprocessing/extract_descriptions.py
    
  3. Preprocess the descriptions. You may need to change the vocab_size, start_id, and end_id in config.py if you generate a new dictionary.

    python preprocessing/process_descriptions.py --word_counts_output_file \ 
      data/word_counts.txt --new_dict
    
  4. Download the MSCOCO images from link and put all the images into ~/dataset/mscoco/all_images.

  5. Object detection for the training images. You need to first download the detection model from here and then extract the model under tf_models/research/object_detection.

    python preprocessing/detect_objects.py --image_path\
      ~/dataset/mscoco/all_images --num_proc 2 --num_gpus 1
    
  6. Generate tfrecord files for images.

    python preprocessing/process_images.py --image_path\
      ~/dataset/mscoco/all_images
    

Training

  1. Train the model without the intialization pipeline.

    python im_caption_full.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --multi_gpu --batch_size 512 --save_checkpoint_steps 1000\
      --gen_lr 0.001 --dis_lr 0.001
    
  2. Evaluate the model. The last element in the b34.json file is the best checkpoint.

    CUDA_VISIBLE_DEVICES='0,1' python eval_all.py\
      --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --data_dir ~/dataset/mscoco/all_images
    js-beautify saving/b34.json
    
  3. Evaluate the model on test set. Suppose the best validation checkpoint is 20000.

    python test_model.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --data_dir ~/dataset/mscoco/all_images --job_dir saving/model.ckpt-20000
    

Initialization (Optional. The files can be found at here).

  1. Train a object-to-sentence model, which is used to generate the pseudo-captions.

    python initialization/obj2sen.py
    
  2. Find the best obj2sen model.

    python initialization/eval_obj2sen.py --threads 8
    
  3. Generate pseudo-captions. Suppose the best validation checkpoint is 35000.

    python initialization/gen_obj2sen_caption.py --num_proc 8\
      --job_dir obj2sen/model.ckpt-35000
    
  4. Train a captioning using pseudo-pairs.

    python initialization/im_caption.py --o2s_ckpt obj2sen/model.ckpt-35000\
      --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt
    
  5. Evaluate the model.

    CUDA_VISIBLE_DEVICES='0,1' python eval_all.py\
      --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --data_dir ~/dataset/mscoco/all_images --job_dir saving_imcap
    js-beautify saving_imcap/b34.json
    
  6. Train sentence auto-encoder, which is used to initialize sentence GAN.

    python initialization/sentence_ae.py
    
  7. Train sentence GAN.

    python initialization/sentence_gan.py
    
  8. Train the full model with initialization. Suppose the best imcap validation checkpoint is 18000.

    python im_caption_full.py --inc_ckpt ~/workspace/ckpt/inception_v4.ckpt\
      --imcap_ckpt saving_imcap/model.ckpt-18000\
      --sae_ckpt sen_gan/model.ckpt-30000 --multi_gpu --batch_size 512\
      --save_checkpoint_steps 1000 --gen_lr 0.001 --dis_lr 0.001
    

Credits

Part of the code is from coco-caption, im2txt, tfgan, resnet, Tensorflow Object Detection API and maskgan.

Xinpeng told me the idea of self-critic, which is crucial to training.

Owner
Yang Feng
SWE @ Goolgle
Yang Feng
A minimalist tool to display a network graph.

A tool to get a minimalist view of any architecture This tool has only be tested with the models included in this repo. Therefore, I can't guarantee t

Thibault Castells 1 Feb 11, 2022
KaziText is a tool for modelling common human errors.

KaziText KaziText is a tool for modelling common human errors. It estimates probabilities of individual error types (so called aspects) from grammatic

ÚFAL 3 Nov 24, 2022
In the AI for TSP competition we try to solve optimization problems using machine learning.

AI for TSP Competition Goal In the AI for TSP competition we try to solve optimization problems using machine learning. The competition will be hosted

Paulo da Costa 11 Nov 27, 2022
Author's PyTorch implementation of TD3 for OpenAI gym tasks

Addressing Function Approximation Error in Actor-Critic Methods PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3). If y

Scott Fujimoto 1.3k Dec 25, 2022
Additional code for Stable-baselines3 to load and upload models from the Hub.

Hugging Face x Stable-baselines3 A library to load and upload Stable-baselines3 models from the Hub. Installation With pip Examples [Todo: add colab t

Hugging Face 34 Dec 10, 2022
Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning This is the code for implementing the MADDPG algorithm presented in

97 Dec 21, 2022
Codes for paper "KNAS: Green Neural Architecture Search"

KNAS Codes for paper "KNAS: Green Neural Architecture Search" KNAS is a green (energy-efficient) Neural Architecture Search (NAS) approach. It contain

90 Dec 22, 2022
structured-generative-modeling

This repository contains the implementation for the paper Information Theoretic StructuredGenerative Modeling, Specially thanks for the open-source co

0 Oct 11, 2021
Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling

Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling Code for the paper: Greg Ver Steeg and Aram Galstyan. "Hamiltonian Dynamics with N

Greg Ver Steeg 25 Mar 14, 2022
Using pytorch to implement unet network for liver image segmentation.

Using pytorch to implement unet network for liver image segmentation.

zxq 1 Dec 17, 2021
Visualizing Yolov5's layers using GradCam

YOLO-V5 GRADCAM I constantly desired to know to which part of an object the object-detection models pay more attention. So I searched for it, but I di

Pooya Mohammadi Kazaj 200 Jan 01, 2023
A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking

PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking PoseRBPF Paper Self-supervision Paper Pose Estimation Video Robot Manipulati

NVIDIA Research Projects 107 Dec 25, 2022
Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style

Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style [NeurIPS 2021] Official code to reproduce the results and data p

Yash Sharma 27 Sep 19, 2022
DeepCAD: A Deep Generative Network for Computer-Aided Design Models

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
Quantum-enhanced transformer neural network

Example of a Quantum-enhanced transformer neural network Get the code: git clone https://github.com/rdisipio/qtransformer.git cd qtransformer Create

Riccardo Di Sipio 61 Nov 08, 2022
MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data

MarcoPolo is a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering Overview MarcoPolo

Chanwoo Kim 13 Dec 18, 2022
Tensorflow implementation of our method: "Triangle Graph Interest Network for Click-through Rate Prediction".

TGIN Tensorflow implementation of our method: "Triangle Graph Interest Network for Click-through Rate Prediction". Files in the folder dataset/ electr

Alibaba 21 Dec 21, 2022
Pytorch Implementation of rpautrat/SuperPoint

SuperPoint-Pytorch (A Pure Pytorch Implementation) SuperPoint: Self-Supervised Interest Point Detection and Description Thanks This work is based on:

76 Dec 27, 2022
Akshat Surolia 2 May 11, 2022
RP-GAN: Stable GAN Training with Random Projections

RP-GAN: Stable GAN Training with Random Projections This repository contains a reference implementation of the algorithm described in the paper: Behna

Ayan Chakrabarti 20 Sep 18, 2021