CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

Related tags

Deep LearningUC2
Overview

UC2

UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu, Jingjing Liu
This is the official repository of UC2, a multili-lingual multi-modal pre-training framefork. In this repository we support end-to-end pretraining and finetuning for image-text retrieval on COCO.

Requirements

We Provide a Docker image to run our code. Please install the following:

To run the docker command without sudo, user need to have docker group membership. Our code only supports Linux with NVIDIA GPUs. We test our code on Ubuntu 18.04 and V100 cards.

Data and Pretrained Checkpoints

Download the pre-processed text features and pretrained checkpoints with the following command:

wget https://mmaisharables.blob.core.windows.net/uc2/UC2_DATA.tar.gz

The image features for mscoco can be obtained from UNITER via this code script. As CC's image features are large and inconvient for direct downloading, please contact UNITER's author to obtain the image features if you are interested in pretraining.

Launch the Docker Container for Experiments

Once the user set up the data and checkpoints properly, please run the following command to launch a docker container and start the pretraining process.

source launch_container_pretrain.sh /PATH_TO_STORAGE/txt_db /PATH_TO_STORAGE/img_db /PATH_TO_STORAGE/finetune /PATH_TO_STORAG/pretrain

Pretraining

(Inside the Docker Container)If the user wants to run pretraining, please use the following command:

horovodrun -np $N_GPU python pretrain.py  --config config/uc2_pretrain.json

Downstream Task Finetuning

Text-to-Image Retrieval To run the finetuning experiment for the text-to-image retrieval task, please use the following command:

horovodrun -np $N_GPU python itm.py --config config/uc2_mscoco_itm.json

Citation

If you find this code useful for your research, please consider citing:

@InProceedings{zhou2021uc,
author = {Zhou, Mingyang and Zhou, Luowei and Wang, Shuohang and Cheng, Yu and Li, Linjie and Yu, Zhou and Liu, Jingjing},
title = {UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2021)},
year = {2021},
month = {June},
abstract = {Vision-and-language pre-training has achieved impressive success in learning multimodal representations between vision and language. To generalize this success to non-English languages, we introduce UC2 , the first machine translation-augmented framework for cross-lingual cross-modal representation learning. To tackle the scarcity problem of multilingual captions for image datasets, we first augment existing English-only datasets with other languages via machine translation (MT). Then we extend the standard Masked Language Modeling and Image-Text Matching training objectives to multilingual setting, where alignment between different languages is captured through shared visual context (i.e., using image as pivot). To facilitate the learning of a joint embedding space of images and all languages of interest, we further propose two novel pre-training tasks, namely Masked Region-to-Token Modeling (MRTM) and Visual Translation Language Modeling (VTLM), leveraging MT-enhanced translated data. Evaluation on multilingual image-text retrieval and multilingual visual question answering benchmarks demonstrates that our proposed framework achieves new state of the art on diverse non-English benchmarks while maintaining comparable performance to monolingual pre-trained models on English tasks.},
url = {https://www.microsoft.com/en-us/research/publication/uc2-universal-cross-lingual-cross-modal-vision-and-language-pre-training/},
}

Acknowledge

Our code is mainly based on Linjie Li and Yen-Chun Chen's project UNITER. We thank the author for opening source their code and providing helful discussion for code implementation. Portions of the code also uses resources from transformers.

Liscense

MIT

Owner
Mingyang Zhou
Ph.D Student at UC Davis with research interest in Multimodality Learning with computer vision and NLP.
Mingyang Zhou
The Unsupervised Reinforcement Learning Benchmark (URLB)

The Unsupervised Reinforcement Learning Benchmark (URLB) URLB provides a set of leading algorithms for unsupervised reinforcement learning where agent

259 Dec 26, 2022
Cross-Document Coreference Resolution

Cross-Document Coreference Resolution This repository contains code and models for end-to-end cross-document coreference resolution, as decribed in ou

Arie Cattan 29 Nov 28, 2022
A PyTorch implementation of a Factorization Machine module in cython.

fmpytorch A library for factorization machines in pytorch. A factorization machine is like a linear model, except multiplicative interaction terms bet

Jack Hessel 167 Jul 06, 2022
PyTorch implementation for paper "Full-Body Visual Self-Modeling of Robot Morphologies".

Full-Body Visual Self-Modeling of Robot Morphologies Boyuan Chen, Robert Kwiatkowskig, Carl Vondrick, Hod Lipson Columbia University Project Website |

Boyuan Chen 32 Jan 02, 2023
A Python package for generating concise, high-quality summaries of a probability distribution

GoodPoints A Python package for generating concise, high-quality summaries of a probability distribution GoodPoints is a collection of tools for compr

Microsoft 28 Oct 10, 2022
codes for Image Inpainting with External-internal Learning and Monochromic Bottleneck

Image Inpainting with External-internal Learning and Monochromic Bottleneck This repository is for the CVPR 2021 paper: 'Image Inpainting with Externa

97 Nov 29, 2022
implementation of the paper "MarginGAN: Adversarial Training in Semi-Supervised Learning"

MarginGAN This repository is the implementation of the paper "MarginGAN: Adversarial Training in Semi-Supervised Learning". 1."preliminary" is the imp

Van 7 Dec 23, 2022
Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"

DSBF Introduction This repository contains the implementation code for paper: Domain-Specific Bias Filtering for Single Labeled Domain Generalization

ScottYuan 7 Jan 05, 2023
FFTNet vocoder implementation

Unofficial Implementation of FFTNet vocode paper. implement the model. implement tests. overfit on a single batch (sanity check). linearize weights fo

Eren Gölge 81 Dec 08, 2022
[ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets"

EarlyBERT This is the official implementation for the paper in ACL-IJCNLP 2021 "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets" by

VITA 13 May 11, 2022
Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,

Luca Moschella 520 Dec 30, 2022
FastyAPI is a Stack boilerplate optimised for heavy loads.

FastyAPI A FastAPI based Stack boilerplate for heavy loads. Explore the docs » View Demo · Report Bug · Request Feature Table of Contents About The Pr

Ali Chaayb 47 Dec 27, 2022
QMagFace: Simple and Accurate Quality-Aware Face Recognition

Quality-Aware Face Recognition 26.11.2021 start readme QMagFace: Simple and Accurate Quality-Aware Face Recognition Research Paper Implementation - To

Philipp Terhörst 59 Jan 04, 2023
Label-Free Model Evaluation with Semi-Structured Dataset Representations

Label-Free Model Evaluation with Semi-Structured Dataset Representations Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch

8 Oct 06, 2022
Deploy pytorch classification model using Flask and Streamlit

Deploy pytorch classification model using Flask and Streamlit

Ben Seo 1 Nov 17, 2021
Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022
This is the official github repository of the Met dataset

The Met dataset This is the official github repository of the Met dataset. The official webpage of the dataset can be found here. What is it? This cod

Nikolaos-Antonios Ypsilantis 35 Dec 17, 2022
Fake-user-agent-traffic-geneator - Python CLI Tool to generate fake traffic against URLs with configurable user-agents

Fake traffic generator for Gartner Demo Generate fake traffic to URLs with custo

New Relic Experimental 3 Oct 31, 2022
[Arxiv preprint] Causality-inspired Single-source Domain Generalization for Medical Image Segmentation (code&data-processing pipeline)

Causality-inspired Single-source Domain Generalization for Medical Image Segmentation Arxiv preprint Repository under construction. Might still be bug

Cheng 31 Dec 27, 2022
The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

FMFCC-A This project is the description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts. The FMFCC-A dataset is shared through BaiduCl

18 Dec 24, 2022