LaBERT - A length-controllable and non-autoregressive image captioning model.

Last update: Nov 13, 2022

Overview

Length-Controllable Image Captioning (ECCV2020)

This repo provides the implemetation of the paper Length-Controllable Image Captioning.

Install

conda create --name labert python=3.7
conda activate labert

conda install pytorch=1.3.1 torchvision cudatoolkit=10.1 -c pytorch
pip install h5py tqdm transformers==2.1.1
pip install git+https://github.com/salaniz/pycocoevalcap

Data & Pre-trained Models

Prepare MSCOCO data follow link.
Download pretrained Bert and Faster-RCNN from Baidu Cloud Disk [code: 0j9f] or Google Drive.
- It's an unified checkpoint file, containing a pretrained Bert-base and the fc6 layer of the Faster-RCNN.
Download our pretrained LaBERT model from Baidu Cloud Disk [code: fpke] or Google Drive.

Scripts

Train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Continue train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU \
  model_path $PATH_TO_MODEL

Inference

python inference.py \
  model_path $PATH_TO_MODEL \
  save_dir $PATH_TO_TEST_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Evaluate

python evaluate.py \
  --gt_caption data/id2captions_test.json \
  --pd_caption $PATH_TO_TEST_OUTPUT/caption_results.json \
  --save_dir $PATH_TO_TEST_OUTPUT

Cite

Please consider citing our paper in your publications if the project helps your research.

@article{deng2020length,
  title={Length-Controllable Image Captioning},
  author={Deng, Chaorui and Ding, Ning and Tan, Mingkui and Wu, Qi},
  journal={arXiv preprint arXiv:2007.09580},
  year={2020}
}

LaBERT - A length-controllable and non-autoregressive image captioning model.

Related tags

Overview

Length-Controllable Image Captioning (ECCV2020)

Install

Data & Pre-trained Models

Scripts

Cite

Owner

bearcatt

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

Software for Multimodalty 2D+3D Facial Expression Recognition (FER) UI

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Train CNNs for the fruits360 data set in NTOU CS「Machine Vision」class.

[ICCV 2021 Oral] Deep Evidential Action Recognition

(3DV 2021 Oral) Filtering by Cluster Consistency for Large-Scale Multi-Image Matching

🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~

HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

Deeper insights into graph convolutional networks for semi-supervised learning

the code of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021)

PyTorch Implementation of Backbone of PicoDet

tensorflow implementation of 'YOLO : Real-Time Object Detection'

CLIP+FFT text-to-image

Dynamic hair modeling from monocular videos using deep neural networks

A practical ML pipeline for data labeling with experiment tracking using DVC.

StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking

object recognition with machine learning on Respberry pi

Implementation of hyperparameter optimization/tuning methods for machine learning & deep learning models

StarGAN - Official PyTorch Implementation (CVPR 2018)

Code for generating the figures in the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?"