Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Last update: Dec 17, 2022

Overview

Introduction

This repository is for X-Linear Attention Networks for Image Captioning (CVPR 2020). The original paper can be found here.

Please cite with the following BibTeX:

@inproceedings{xlinear2020cvpr,
  title={X-Linear Attention Networks for Image Captioning},
  author={Pan, Yingwei and Yao, Ting and Li, Yehao and Mei, Tao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Requirements

Python 3
CUDA 10
numpy
tqdm
easydict
PyTorch (>1.0)
torchvision
coco-caption

Data preparation

Download the bottom up features and convert them to npz files

python2 tools/create_feats.py --infeats bottom_up_tsv --outfolder ./mscoco/feature/up_down_10_100

Download the annotations into the mscoco folder. More details about data preparation can be referred to self-critical.pytorch
Download coco-caption and setup the path of __C.INFERENCE.COCO_PATH in lib/config.py
The pretrained models and results can be downloaded here.
The pretrained SENet-154 model can be downloaded here.

Training

Train X-LAN model

bash experiments/xlan/train.sh

Train X-LAN model using self critical

Copy the pretrained model into experiments/xlan_rl/snapshot and run the script

bash experiments/xlan_rl/train.sh

Train X-LAN transformer model

bash experiments/xtransformer/train.sh

Train X-LAN transformer model using self critical

Copy the pretrained model into experiments/xtransformer_rl/snapshot and run the script

bash experiments/xtransformer_rl/train.sh

Evaluation

CUDA_VISIBLE_DEVICES=0 python3 main_test.py --folder experiments/model_folder --resume model_epoch

Acknowledgements

Thanks the contribution of self-critical.pytorch and awesome PyTorch team.

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Related tags

Overview

Introduction

Requirements

Data preparation

Training

Train X-LAN model

Train X-LAN model using self critical

Train X-LAN transformer model

Train X-LAN transformer model using self critical

Evaluation

Acknowledgements

Owner

JDAI-CV

Unofficial PyTorch Implementation of "DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features"

This repository contains all source code, pre-trained models related to the paper "An Empirical Study on GANs with Margin Cosine Loss and Relativistic Discriminator"

chen2020iros: Learning an Overlap-based Observation Model for 3D LiDAR Localization.

Weakly Supervised End-to-End Learning (NeurIPS 2021)

Illuminated3D This project participates in the Nasa Space Apps Challenge 2021.

Code of our paper "Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning"

HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021]

Image reconstruction done with untrained neural networks.

A high performance implementation of HDBSCAN clustering.

PyTorch and GPyTorch implementation of the paper "Conditioning Sparse Variational Gaussian Processes for Online Decision-making."

Pgn2tex - Scripts to convert pgn files to latex document. Useful to build books or pdf from pgn studies

The Official PyTorch Implementation of "LSGM: Score-based Generative Modeling in Latent Space" (NeurIPS 2021)

Standalone pre-training recipe with JAX+Flax

ParaGen is a PyTorch deep learning framework for parallel sequence generation

Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

Neural Oblivious Decision Ensembles

A Deep learning based streamlit web app which can tell with which bollywood celebrity your face resembles.

DC3: A Learning Method for Optimization with Hard Constraints

A web application that provides real time temperature and humidity readings of a house.