source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

Overview

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

This repository contains source code and pre-trained/fine-tuned checkpoints for NAACL 2021 paper "LightningDOT". It currently supports fine-tuning on MSCOCO and Flickr30k. Pre-training code and a demo for FULL MSCOCO retrieval are also available.

Overview of LightningDot

Some code in this repo is copied/modifed from UNITER and DPR.

If you find the code useful for your research, please consider citing:

    @inproceedings{sun2021lightningdot,
    title={LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval},
    author={Sun, Siqi and Chen, Yen-Chun and Li, Linjie and Wang, Shuohang and Fang, Yuwei and Liu, Jingjing},
    booktitle={NAACL-HLT},
    year={2021}
    } 

UNITER Environment

To run UNITER for re-ranker, please set a seperate environment based on this repo.

All pre-training and fine-tuning are using a conda environment that can be created as follows.

Environment

Under the project home folder, first run (depends on your CUDA version)

conda env create -f DVL.yml
conda activate DVL
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

, then install apex by

cd ../
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

In order to use distributed training, under super user, install mpi by

rm -r /usr/local/mpi

wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.4.tar.gz 
tar -xvf openmpi-4.0.4.tar.gz 
cd openmpi-4.0.4
./configure --prefix=/usr/local/mpi --enable-orterun-prefix-by-default --disable-getpwuid --with-verbs
sudo apt-get install libnuma-dev
sudo make -j$(nproc) all && sudo make install
ldconfig

cd -
rm -r openmpi-4.0.4
rm openmpi-4.0.4.tar.gz

export OPENMPI_VERSION=4.0.4

. Finally install horovod by

echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" \
    > /etc/apt/sources.list.d/nvidia-ml.list
apt update
apt install libnccl2=2.4.7-1+cuda10.1 libnccl-dev=2.4.7-1+cuda10.1

export PATH=/usr/local/mpi/bin:$PATH
HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_WITH_PYTORCH=1 pip install --no-cache-dir horovod
ldconfig

If you see Error Msg: /usr/bin/ld: cannot find -lnuma, then try

sudo apt-get install libnuma-dev

Download Checkpoints and Meta file

Under project home folder, run

bash bash/download_data.sh

Currently the raw image files and extracted features are not available to download.

Pre-training

Modify the config file at ./config/pretrain-alldata-base.json accordingly, and run

horovodrun -np $NUM_GPU python pretrain.py --config ./config/pretrain-alldata-base.json

. Typically you need to change img_checkpoint, output_dir, and train/val datasets.

A pre-trained checkpoint is availabe at LightningDot.

The checkpoints for UNITER-base and BERT-base can be obtaind from UNITER-base and BERT-base.

Fine-tuning on MSCOCO and Flickr30k

We provide an sample bash script at ./bash/train_flickr.sh, which we used to search for learning rate.

Two checkpoints that have been already fine-tuned on MSCOCO and Flickr30k are also provided at COCO-FT and Flickr-FT.

Evaluation

Run

python eval_itm.py  your_eval_config.json  your_checkpoint.pt 

to run the evaluation. We provide three examples that could be obtained solely based on checkpoints and configurations provided in this repo.

Note that your results may NOT be exactly the same with results below due to different machine/environment configurations (but they should be close enough).

  • Zero-shot evaluation on Flickr30k:
python eval_itm.py ./config/flickr30k_eval_config.json ./data/model/LightningDot.pt
image retrieval recall = {1: 0.5332, 5: 0.8058, 10: 0.8804}
txt retrieval recall = {1: 0.682, 5: 0.891, 10: 0.94}.
  • Fine-tune on flickr, evaluate on flickr:
python eval_itm.py ./config/flickr30k_eval_config.json ./data/model/flickr-ft.pt
image retrieval recall = {1: 0.699, 5: 0.911, 10: 0.9518}
txt retrieval recall = {1: 0.839, 5: 0.972, 10: 0.986}
  • Fine-tune on MSCOCO, evaluate on MSCOCO:
python eval_itm.py ./config/coco_eval_config.json ./data/model/coco-ft.pt
image retrieval recall = {1: 0.4577, 5: 0.7453, 10: 0.8379}
txt retrieval recall = {1: 0.6004, 5: 0.8516, 10: 0.9172}

Meta File

You may need the meta file used in some scripts, which can be obtained from MSCOCO-Meta and Flickr-Meta.

Demo

TODO

Re-Ranking

Note that Re-ranker is using prediction file generated from UNITER or OSCAR due to use of different pytorch version.

Re-ranking script is currently provided as is, and has not been cleaned yet.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

License

MIT

MLSpace: Hassle-free machine learning & deep learning development

MLSpace: Hassle-free machine learning & deep learning development

abhishek thakur 293 Jan 03, 2023
Set of methods to ensemble boxes from different object detection models, including implementation of "Weighted boxes fusion (WBF)" method.

Set of methods to ensemble boxes from different object detection models, including implementation of "Weighted boxes fusion (WBF)" method.

1.4k Jan 05, 2023
Code for the Paper: Conditional Variational Capsule Network for Open Set Recognition

Conditional Variational Capsule Network for Open Set Recognition This repository hosts the official code related to "Conditional Variational Capsule N

Guglielmo Camporese 35 Nov 21, 2022
Python wrapper to access the amazon selling partner API

PYTHON-AMAZON-SP-API Amazon Selling-Partner API If you have questions, please join on slack Contributions very welcome! Installation pip install pytho

Michael Primke 330 Jan 06, 2023
Yolov3 pytorch implementation

YOLOV3 Pytorch实现 在bubbliiing大佬代码的基础上进行了修改,添加了部分注释。 预训练模型 预训练模型来源于bubbliiing。 链接:https://pan.baidu.com/s/1ncREw6Na9ycZptdxiVMApw 提取码:appk 训练自己的数据集 按照VO

4 Aug 27, 2022
Unofficial PyTorch Implementation of "DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features"

Pytorch Implementation of Deep Orthogonal Fusion of Local and Global Features (DOLG) This is the unofficial PyTorch Implementation of "DOLG: Single-St

DK 96 Jan 06, 2023
Efficient semidefinite bounds for multi-label discrete graphical models.

Low rank solvers #################################### benchmark/ : folder with the random instances used in the paper. ############################

1 Dec 08, 2022
NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs.

NAS-HPO-Bench-II API Overview NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs. It helps a fair and low-

yoichi hirose 8 Nov 21, 2022
The Dual Memory is build from a simple CNN for the deep memory and Linear Regression fro the fast Memory

Simple-DMA a simple Dual Memory Architecture for classifications. based on the paper Dual-Memory Deep Learning Architectures for Lifelong Learning of

1 Jan 27, 2022
The Few-Shot Bot: Prompt-Based Learning for Dialogue Systems

Few-Shot Bot: Prompt-Based Learning for Dialogue Systems This repository includes the dataset, experiments results, and code for the paper: Few-Shot B

Andrea Madotto 103 Dec 28, 2022
ML-Decoder: Scalable and Versatile Classification Head

ML-Decoder: Scalable and Versatile Classification Head Paper Official PyTorch Implementation Tal Ridnik, Gilad Sharir, Avi Ben-Cohen, Emanuel Ben-Baru

189 Jan 04, 2023
This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

Introduction: X-Ray Report Generation This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports". O

no name 36 Dec 16, 2022
Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Event Sourced Bank A "wide but shallow" example of using the Python event sourci

3 Mar 09, 2022
A set of tools for converting a darknet dataset to COCO format working with YOLOX

darknet格式数据→COCO darknet训练数据目录结构(详情参见dataset/darknet): darknet ├── class.names ├── gen_config.data ├── gen_train.txt ├── gen_valid.txt └── images

RapidAI-NG 148 Jan 03, 2023
Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology Sharon Zhou, Eric Zelikman

Stanford Machine Learning Group 34 Nov 16, 2022
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

Adelaide Intelligent Machines (AIM) Group 3k Jan 02, 2023
MutualGuide is a compact object detector specially designed for embedded devices

Introduction MutualGuide is a compact object detector specially designed for embedded devices. Comparing to existing detectors, this repo contains two

ZHANG Heng 103 Dec 13, 2022
Minimal implementation of PAWS (https://arxiv.org/abs/2104.13963) in TensorFlow.

PAWS-TF 🐾 Implementation of Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples (PAWS)

Sayak Paul 43 Jan 08, 2023
Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique

AOS: Airborne Optical Sectioning Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique that employs manned or unmanned airc

JKU Linz, Institute of Computer Graphics 39 Dec 09, 2022
This thesis is mainly concerned with state-space methods for a class of deep Gaussian process (DGP) regression problems

Doctoral dissertation of Zheng Zhao This thesis is mainly concerned with state-space methods for a class of deep Gaussian process (DGP) regression pro

Zheng Zhao 21 Nov 14, 2022