Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Last update: Dec 17, 2022

Overview

Introduction

This repository is for X-Linear Attention Networks for Image Captioning (CVPR 2020). The original paper can be found here.

Please cite with the following BibTeX:

@inproceedings{xlinear2020cvpr,
  title={X-Linear Attention Networks for Image Captioning},
  author={Pan, Yingwei and Yao, Ting and Li, Yehao and Mei, Tao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Requirements

Python 3
CUDA 10
numpy
tqdm
easydict
PyTorch (>1.0)
torchvision
coco-caption

Data preparation

Download the bottom up features and convert them to npz files

python2 tools/create_feats.py --infeats bottom_up_tsv --outfolder ./mscoco/feature/up_down_10_100

Download the annotations into the mscoco folder. More details about data preparation can be referred to self-critical.pytorch
Download coco-caption and setup the path of __C.INFERENCE.COCO_PATH in lib/config.py
The pretrained models and results can be downloaded here.
The pretrained SENet-154 model can be downloaded here.

Training

Train X-LAN model

bash experiments/xlan/train.sh

Train X-LAN model using self critical

Copy the pretrained model into experiments/xlan_rl/snapshot and run the script

bash experiments/xlan_rl/train.sh

Train X-LAN transformer model

bash experiments/xtransformer/train.sh

Train X-LAN transformer model using self critical

Copy the pretrained model into experiments/xtransformer_rl/snapshot and run the script

bash experiments/xtransformer_rl/train.sh

Evaluation

CUDA_VISIBLE_DEVICES=0 python3 main_test.py --folder experiments/model_folder --resume model_epoch

Acknowledgements

Thanks the contribution of self-critical.pytorch and awesome PyTorch team.

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Related tags

Overview

Introduction

Requirements

Data preparation

Training

Train X-LAN model

Train X-LAN model using self critical

Train X-LAN transformer model

Train X-LAN transformer model using self critical

Evaluation

Acknowledgements

Owner

JDAI-CV

Library of various Few-Shot Learning frameworks for text classification

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

Anomaly detection in multi-agent trajectories: Code for training, evaluation and the OpenAI highway simulation.

Official PyTorch implementation of Data-free Knowledge Distillation for Object Detection, WACV 2021.

NR-GAN: Noise Robust Generative Adversarial Networks

Repository to run object detection on a model trained on an autonomous driving dataset.

SARS-Cov-2 Recombinant Finder for fasta sequences

Detectron2-FC a fast construction platform of neural network algorithm based on detectron2

The official start-up code for paper "FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark."

Llvlir - Low Level Variable Length Intermediate Representation

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Code for the paper Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

Implementation of several Bayesian multi-target tracking algorithms, including Poisson multi-Bernoulli mixture filters for sets of targets and sets of trajectories. The repository also includes the GOSPA metric and a metric for sets of trajectories to evaluate performance.

CrossMLP - The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.

Training and Evaluation Code for Neural Volumes

Effect of Different Encodings and Distance Functions on Quantum Instance-based Classifiers

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Pipeline code for Sequential-GAM(Genome Architecture Mapping).

A Python package for performing pore network modeling of porous media

Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition"