Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Last update: Aug 27, 2022

Overview

Adaptively Aligned Image Captioning via Adaptive Attention Time

This repository includes the implementation for Adaptively Aligned Image Captioning via Adaptive Attention Time.

Requirements

Python 3.6
Java 1.8.0
PyTorch 1.0
cider
coco-caption
tensorboardX

Training AAT

Prepare data (with python2)

See details in data/README.md.

(notes: Set word_count_threshold in scripts/prepro_labels.py to 4 to generate a vocabulary of size 10,369.)

You should also preprocess the dataset and get the cache for calculating cider score for SCST:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Training

$ sh train-aat.sh

See opts.py for the options.

Evaluation

$ CUDA_VISIBLE_DEVICES=0 python eval.py --model log/log_aat_rl/model.pth --infos_path log/log_aat_rl/infos_aat.pkl  --dump_images 0 --dump_json 1 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 100 --split test

Reference

If you find this repo helpful, please consider citing:

@inproceedings{huang2019adaptively,
  title = {Adaptively Aligned Image Captioning via Adaptive Attention Time},
  author = {Huang, Lun and Wang, Wenmin and Xia, Yaxian and Chen, Jie},
  booktitle = {Advances in Neural Information Processing Systems 32},
  year={2019}
}

Acknowledgements

This repository is based on Ruotian Luo's self-critical.pytorch.

Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Related tags

Overview

Adaptively Aligned Image Captioning via Adaptive Attention Time

Requirements

Training AAT

Prepare data (with python2)

Training

Evaluation

Reference

Acknowledgements

Owner

Lun Huang

PyTorch code for the ICCV'21 paper: "Always Be Dreaming: A New Approach for Class-Incremental Learning"

PyTorch Implementation of Region Similarity Representation Learning (ReSim)

A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

Official Pytorch Implementation of GraphiT

Explainability for Vision Transformers (in PyTorch)

A dataset for online Arabic calligraphy

Implementation for On Provable Benefits of Depth in Training Graph Convolutional Networks

The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".

JAX-based neural network library

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

HIVE: Evaluating the Human Interpretability of Visual Explanations

Keras-1D-NN-Classifier

I-BERT: Integer-only BERT Quantization

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

ML-Decoder: Scalable and Versatile Classification Head

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Dense Unsupervised Learning for Video Segmentation (NeurIPS*2021)

Toolbox to analyze temporal context invariance of deep neural networks

A program to recognize fruits on pictures or videos using yolov5

Official code for "Maximum Likelihood Training of Score-Based Diffusion Models", NeurIPS 2021 (spotlight)