LaBERT - A length-controllable and non-autoregressive image captioning model.

Last update: Nov 13, 2022

Overview

Length-Controllable Image Captioning (ECCV2020)

This repo provides the implemetation of the paper Length-Controllable Image Captioning.

Install

conda create --name labert python=3.7
conda activate labert

conda install pytorch=1.3.1 torchvision cudatoolkit=10.1 -c pytorch
pip install h5py tqdm transformers==2.1.1
pip install git+https://github.com/salaniz/pycocoevalcap

Data & Pre-trained Models

Prepare MSCOCO data follow link.
Download pretrained Bert and Faster-RCNN from Baidu Cloud Disk [code: 0j9f] or Google Drive.
- It's an unified checkpoint file, containing a pretrained Bert-base and the fc6 layer of the Faster-RCNN.
Download our pretrained LaBERT model from Baidu Cloud Disk [code: fpke] or Google Drive.

Scripts

Train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Continue train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU \
  model_path $PATH_TO_MODEL

Inference

python inference.py \
  model_path $PATH_TO_MODEL \
  save_dir $PATH_TO_TEST_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Evaluate

python evaluate.py \
  --gt_caption data/id2captions_test.json \
  --pd_caption $PATH_TO_TEST_OUTPUT/caption_results.json \
  --save_dir $PATH_TO_TEST_OUTPUT

Cite

Please consider citing our paper in your publications if the project helps your research.

@article{deng2020length,
  title={Length-Controllable Image Captioning},
  author={Deng, Chaorui and Ding, Ning and Tan, Mingkui and Wu, Qi},
  journal={arXiv preprint arXiv:2007.09580},
  year={2020}
}

LaBERT - A length-controllable and non-autoregressive image captioning model.

Related tags

Overview

Length-Controllable Image Captioning (ECCV2020)

Install

Data & Pre-trained Models

Scripts

Cite

Owner

bearcatt

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

A Quick and Dirty Progressive Neural Network written in TensorFlow.

Evaluation suite for large-scale language models.

Semi-supervised Video Deraining with Dynamical Rain Generator (CVPR, 2021, Pytorch)

This is the repository of our article published on MDPI Entropy "Feature Selection for Recommender Systems with Quantum Computing".

Happywhale - Whale and Dolphin Identification Silver🥈 Solution (26/1588)

Pytorch implementation of Supporting Clustering with Contrastive Learning, NAACL 2021

Metrics to evaluate quality and efficacy of synthetic datasets.

This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

PyTorch implementation of the paper: "Preference-Adaptive Meta-Learning for Cold-Start Recommendation", IJCAI, 2021.

Paddle-Adversarial-Toolbox (PAT) is a Python library for Deep Learning Security based on PaddlePaddle.

This Deep Learning Model Predicts that from which disease you are suffering.

Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation.

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

VoxHRNet - Whole Brain Segmentation with Full Volume Neural Network

Omnidirectional camera calibration in python

Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020