Semi-Autoregressive Transformer for Image Captioning

Last update: Dec 09, 2022

Related tags

Deep Learning satic

Overview

Semi-Autoregressive Transformer for Image Captioning

Requirements

Python 3.6
Pytorch 1.6

Prepare data

Please use git clone --recurse-submodules to clone this repository and remember to follow initialization steps in coco-caption/README.md.
Download the preprocessd dataset from this link and extract it to data/.
Please follow this instruction to prepare the adaptive bottom-up features and place them under data/mscoco/. Please follow this instruction to prepare the features and place them under data/cocotest/ for online test evaluation.
Download part checkpoints from here and extract them to save/.

Offline Evaluation

To reproduce the results, such as SATIC(K=2, bw=1) after self-critical training, just run

python3 eval.py  --model  save/nsc-sat-2-from-nsc-seqkd/model-best.pth   --infos_path  save/nsc-sat-2-from-nsc-seqkd/infos_nsc-sat-2-from-nsc-seqkd-best.pkl    --batch_size  1   --beam_size   1   --id  nsc-sat-2-from-nsc-seqkd

Online Evaluation

Please first run

python3 eval_cocotest.py  --input_json  data/cocotest.json  --input_fc_dir data/cocotest/cocotest_bu_fc --input_att_dir  data/cocotest/cocotest_bu_att   --input_label_h5    data/cocotalk_label.h5  --num_images -1    --language_eval 0
--model  save/nsc-sat-4-from-nsc-seqkd/model-best.pth   --infos_path  save/nsc-sat-4-from-nsc-seqkd/infos_nsc-sat-4-from-nsc-seqkd-best.pkl    --batch_size  32   --beam_size   3   --id   captions_test2014_alg_results

and then follow the instruction to upload results.

Training

In the first training stage, such as SATIC(K=2) model with sequence-level distillation and weight initialization, run

python3  train.py   --noamopt --noamopt_warmup 20000 --label_smoothing 0.0  --seq_per_img 5 --batch_size 10 --beam_size 1 --learning_rate 5e-4 --num_layers 6 --input_encoding_size 512 --rnn_size 2048 --learning_rate_decay_start 0 --scheduled_sampling_start 0  --save_checkpoint_every 3000 --language_eval 1 --val_images_use 5000 --max_epochs 15    --input_label_h5   data/cocotalk_seq-kd-from-nsc-transformer-baseline-b5_label.h5   --checkpoint_path   save/sat-2-from-nsc-seqkd   --id   sat-2-from-nsc-seqkd   --K  2

Then in the second training stage, copy the above pretrained model first

cd save
./copy_model.sh  sat-2-from-nsc-seqkd    nsc-sat-2-from-nsc-seqkd
cd ..

and then run

python3  train.py    --seq_per_img 5 --batch_size 10 --beam_size 1 --learning_rate 1e-5 --num_layers 6 --input_encoding_size 512 --rnn_size 2048  --save_checkpoint_every 3000 --language_eval 1 --val_images_use 5000 --self_critical_after 10  --max_epochs    40   --input_label_h5    data/cocotalk_label.h5   --start_from   save/nsc-sat-2-from-nsc-seqkd   --checkpoint_path   save/nsc-sat-2-from-nsc-seqkd  --id  nsc-sat-2-from-nsc-seqkd    --K 2

Citation

@article{zhou2021semi,
  title={Semi-Autoregressive Transformer for Image Captioning},
  author={Zhou, Yuanen and Zhang, Yong and Hu, Zhenzhen and Wang, Meng},
  journal={arXiv preprint arXiv:2106.09436},
  year={2021}
}

Acknowledgements

This repository is built upon self-critical.pytorch. Thanks for the released code.

Semi-Autoregressive Transformer for Image Captioning

Related tags

Overview

Semi-Autoregressive Transformer for Image Captioning

Requirements

Prepare data

Offline Evaluation

Online Evaluation

Training

Citation

Acknowledgements

Owner

YE Zhou

Python PID Tuner - Based on a FOPDT model obtained using a Open Loop Process Reaction Curve

The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

Aircraft design optimization made fast through modern automatic differentiation

Using multidimensional LSTM neural networks to create a forecast for Bitcoin price

Learning and Building Convolutional Neural Networks using PyTorch

2D Human Pose estimation using transformers. Implementation in Pytorch

PyTorch implementation of the paper: "Preference-Adaptive Meta-Learning for Cold-Start Recommendation", IJCAI, 2021.

Training DiffWave using variational method from Variational Diffusion Models.

MobileNetV1-V2，MobileNeXt，GhostNet，AdderNet，ShuffleNetV1-V2，Mobile+ViT etc.

🌎 The Modern Declarative Data Flow Framework for the AI Empowered Generation.

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

Code for project: "Learning to Minimize Remainder in Supervised Learning".

Multi-modal Content Creation Model Training Infrastructure including the FACT model (AI Choreographer) implementation.

Few-shot Neural Architecture Search

Camera calibration & 3D pose estimation tools for AcinoSet

SingleVC performs any-to-one VC, which is an important component of MediumVC project.

Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

A synthetic texture-invariant dataset for object detection of UAVs

Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning