Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

Overview

VILLA: Vision-and-Language Adversarial Training

This is the official repository of VILLA (NeurIPS 2020 Spotlight). This repository currently supports adversarial finetuning of UNITER on VQA, VCR, NLVR2, and SNLI-VE. Adversarial pre-training with in-domain data will be available soon. Both VILLA-base and VILLA-large pre-trained checkpoints are released.

Overview of VILLA

Most of the code in this repo are copied/modified from UNITER.

Requirements

We provide Docker image for easier reproduction. Please install the following:

Our scripts require the user to have the docker group membership so that docker commands can be run without sudo. We only support Linux with NVIDIA GPUs. We test on Ubuntu 18.04 and V100 cards. We use mixed-precision training hence GPUs with Tensor Cores are recommended.

Quick Start

NOTE: Please run bash scripts/download_pretrained.sh $PATH_TO_STORAGE to get our latest pretrained VILLA checkpoints. This will download both the base and large models.

We use VQA as an end-to-end example for using this code base.

  1. Download processed data and pretrained models with the following command.

    bash scripts/download_vqa.sh $PATH_TO_STORAGE

    After downloading you should see the following folder structure:

    ├── finetune 
    ├── img_db
    │   ├── coco_test2015
    │   ├── coco_test2015.tar
    │   ├── coco_train2014
    │   ├── coco_train2014.tar
    │   ├── coco_val2014
    │   ├── coco_val2014.tar
    │   ├── vg
    │   └── vg.tar
    ├── pretrained
        ├── uniter-base.pt
    │   └── villa-base.pt
    └── txt_db
        ├── vqa_devval.db
        ├── vqa_devval.db.tar
        ├── vqa_test.db
        ├── vqa_test.db.tar
        ├── vqa_train.db
        ├── vqa_train.db.tar
        ├── vqa_trainval.db
        ├── vqa_trainval.db.tar
        ├── vqa_vg.db
        └── vqa_vg.db.tar
    
    

    You can put different pre-trained checkpoints inside the /pretrained folder based on your need.

  2. Launch the Docker container for running the experiments.

    # docker image should be automatically pulled
    source launch_container.sh $PATH_TO_STORAGE/txt_db $PATH_TO_STORAGE/img_db \
        $PATH_TO_STORAGE/finetune $PATH_TO_STORAGE/pretrained

    The launch script respects $CUDA_VISIBLE_DEVICES environment variable. Note that the source code is mounted into the container under /src instead of built into the image so that user modification will be reflected without re-building the image. (Data folders are mounted into the container separately for flexibility on folder structures.)

  3. Run finetuning for the VQA task.

    # inside the container
    horovodrun -np $N_GPU python train_vqa_adv.py --config $YOUR_CONFIG_JSON
    
    # specific example
    horovodrun -np 4 python train_vqa_adv.py --config config/train-vqa-base-4gpu-adv.json
  4. Run inference for the VQA task and then evaluate.

    # inference
    python inf_vqa.py --txt_db /txt/vqa_test.db --img_db /img/coco_test2015 \
    --output_dir $VQA_EXP --checkpoint 6000 --pin_mem --fp16

    The result file will be written at $VQA_EXP/results_test/results_6000_all.json, which can be submitted to the evaluation server

  5. Customization

    # training options
    python train_vqa_adv.py --help
    • command-line argument overwrites JSON config files
    • JSON config overwrites argparse default value.
    • use horovodrun to run multi-GPU training
    • --gradient_accumulation_steps emulates multi-gpu training
    • --checkpoint selects UNITER or VILLA pre-trained checkpoints
    • --adv_training decides using adv. training or not
    • --adv_modality takes values from ['text'], ['image'], ['text','image'], and ['text','image','alter'], the last two correspond to adding perturbations on two modalities simultaneously or alternatively

Downstream Tasks Finetuning

VCR

NOTE: train and inference should be ran inside the docker container

  1. download data
    bash scripts/download_vcr.sh $PATH_TO_STORAGE
    
  2. train
    horovodrun -np 4 python train_vcr_adv.py --config config/train-vcr-base-4gpu-adv.json \
        --output_dir $VCR_EXP
    
  3. inference
    horovodrun -np 4 python inf_vcr.py --txt_db /txt/vcr_test.db \
        --img_db "/img/vcr_gt_test/;/img/vcr_test/" \
        --split test --output_dir $VCR_EXP --checkpoint 8000 \
        --pin_mem --fp16
    
    The result file will be written at $VCR_EXP/results_test/results_8000_all.csv, which can be submitted to VCR leaderboard for evaluation.

NLVR2

NOTE: train and inference should be ran inside the docker container

  1. download data
    bash scripts/download_nlvr2.sh $PATH_TO_STORAGE
    
  2. train
    horovodrun -np 4 python train_nlvr2_adv.py --config config/train-nlvr2-base-1gpu-adv.json \
        --output_dir $NLVR2_EXP
    
  3. inference
    python inf_nlvr2.py --txt_db /txt/nlvr2_test1.db/ --img_db /img/nlvr2_test/ \
    --train_dir /storage/nlvr-base/ --ckpt 6500 --output_dir . --fp16
    

Visual Entailment (SNLI-VE)

NOTE: train should be ran inside the docker container

  1. download data
    bash scripts/download_ve.sh $PATH_TO_STORAGE
    
  2. train
    horovodrun -np 2 python train_ve_adv.py --config config/train-ve-base-2gpu-adv.json \
        --output_dir $VE_EXP
    

Adversarial Training of LXMERT

To keep things simple, we provide another separate repo that can be used to reproduce our results on adversarial finetuning of LXMERT on VQA, GQA, and NLVR2.

Citation

If you find this code useful for your research, please consider citing:

@inproceedings{gan2020large,
  title={Large-Scale Adversarial Training for Vision-and-Language Representation Learning},
  author={Gan, Zhe and Chen, Yen-Chun and Li, Linjie and Zhu, Chen and Cheng, Yu and Liu, Jingjing},
  booktitle={NeurIPS},
  year={2020}
}

@inproceedings{chen2020uniter,
  title={Uniter: Universal image-text representation learning},
  author={Chen, Yen-Chun and Li, Linjie and Yu, Licheng and Kholy, Ahmed El and Ahmed, Faisal and Gan, Zhe and Cheng, Yu and Liu, Jingjing},
  booktitle={ECCV},
  year={2020}
}

License

MIT

A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

RITA DSL This is a language, loosely based on language Apache UIMA RUTA, focused on writing manual language rules, which compiles into either spaCy co

Šarūnas Navickas 60 Sep 26, 2022
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

(Framework for Adapting Representation Models) What is it? FARM makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built u

deepset 1.6k Dec 27, 2022
Shared code for training sentence embeddings with Flax / JAX

flax-sentence-embeddings This repository will be used to share code for the Flax / JAX community event to train sentence embeddings on 1B+ training pa

Nils Reimers 23 Dec 30, 2022
Spam filtering made easy for you

spammy Author: Tasdik Rahman Latest version: 1.0.3 Contents 1 Overview 2 Features 3 Example 3.1 Accuracy of the classifier 4 Installation 4.1 Upgradin

Tasdik Rahman 137 Dec 18, 2022
⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

Translations 🇩🇪 DE 🇫🇷 FR 🇭🇺 HU 🇮🇩 ID 🇮🇹 IT 🇳🇱 NL 🇧🇷 PT-BR 🇷🇺 RU 🇨🇳 ZH ➡️ Documentation | Discord | Installation Guide ⬅️ Fully autom

11.2k Jan 05, 2023
A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. X-Ray supports 18 languages.

WordDumb A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. Languages X-Ray supp

172 Dec 29, 2022
EMNLP 2021 paper "Pre-train or Annotate? Domain Adaptation with a Constrained Budget".

Pre-train or Annotate? Domain Adaptation with a Constrained Budget This repo contains code and data associated with EMNLP 2021 paper "Pre-train or Ann

Fan Bai 8 Dec 17, 2021
Generate text line images for training deep learning OCR model (e.g. CRNN)

Generate text line images for training deep learning OCR model (e.g. CRNN)

532 Jan 06, 2023
JaQuAD: Japanese Question Answering Dataset

JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension (2022, Skelter Labs)

SkelterLabs 84 Dec 27, 2022
HAN2HAN : Hangul Font Generation

HAN2HAN : Hangul Font Generation

Changwoo Lee 36 Dec 28, 2022
Open solution to the Toxic Comment Classification Challenge

Starter code: Kaggle Toxic Comment Classification Challenge More competitions 🎇 Check collection of public projects 🎁 , where you can find multiple

minerva.ml 153 Jun 22, 2022
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Reduce T5 model size by 3X and increase the inference speed up to 5X. Install Usage Details Functionalities Benchmarks Onnx model Quantized onnx model

Kiran R 399 Jan 05, 2023
Pipeline for fast building text classification TF-IDF + LogReg baselines.

Text Classification Baseline Pipeline for fast building text classification TF-IDF + LogReg baselines. Usage Instead of writing custom code for specif

Dani El-Ayyass 57 Dec 07, 2022
A Telegram bot to add notes to Flomo.

flomo bot 使用 Telegram 机器人发送笔记到你的 Flomo. 你需要有一台可访问 Telegram 的服务器。 Steps @BotFather 新建机器人,获取 token Flomo 官网获取 API,链接 https://flomoapp.com/mine?source=in

Zhen 44 Dec 30, 2022
Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Welcome to Healthsea ✨ Create better access to health with spaCy. Healthsea is a pipeline for analyzing user reviews to supplement products by extract

Explosion 75 Dec 19, 2022
Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

Sploitus Command line search tool for sploitus.com. Think searchsploit, but with

watchdog2000 5 Mar 07, 2022
Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

PART 2: CHAIN LINKING AUDIO-TO-TEXT NLP TASKS 2A: TRANSCRIBE-TRANSLATE-SENTIMENT-ANALYSIS In notebook3.0, I demo a simple workflow to: transcribe a lo

Chua Chin Hon 30 Jul 13, 2022
Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

0 Feb 13, 2022
LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating (Dataset) The dataset is from Amazon Review Data (2018)

Immanuvel Prathap S 1 Jan 16, 2022
Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

BADER ALABDAN 2 Oct 22, 2022