Visual dialog agents with pre-trained vision-and-language encoders.

Overview

Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation

Or READ-UP: Referring Expression Agent Dialog with Unified Pretraining.

This repo includes the training/testing code for our paper Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation that has been accepted by CVPR 2021.

Please cite the following paper if you use the code in this repository:

@inproceedings{tu2021learning,
  title={Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation},
  author={Tu, Tao and Ping, Qing and Thattai, Govindarajan and Tur, Gokhan and Natarajan, Prem},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5622--5631},
  year={2021}
}

Repository Setup

Environment

The following environment is recommended:

Instance storage: > 800 GB
pytorch 1.4.0
cuda 10.0

Set up virtual environment and install pytorch:

$ conda create -n read_up python=3.6
$ conda activate read_up
$ git clone https://github.com/amazon-research/read-up.git

# [IMPORTANT] pytorch 1.4.0 have no issue for parallel training
$ conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.0 -c pytorch

Install dependencies:

# Install general dependencies
$ sudo apt-get install build-essential libcap-dev
$ pip install -r requirement.txt

# Install vqa-maskrcnn-benchmark (for feature extraction only)
$ git clone https://gitlab.com/vedanuj/vqa-maskrcnn-benchmark.git
$ cd vqa-maskrcnn-benchmark
$ python setup.py build develop

Install Apex for distributed training

# Apex is used for both `faster-rcnn feature extraction` & `distributed training`
$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Dataset

Meta-data

Download the GuessWhat?! dataset:

$ wget https://florian-strub.com/guesswhat.train.jsonl.gz -P data/
$ wget https://florian-strub.com//guesswhat.valid.jsonl.gz -P data/
$ wget https://florian-strub.com//guesswhat.test.jsonl.gz -P data/

Prepare dict.json:

  1. Set up repo as instructed in https://github.com/GuessWhatGame/guesswhat
  2. Generate the dict.json file:
$ python src/guesswhat/preprocess_data/create_dictionary.py -data_dir data -dict_file dict.json -min_occ 3
  1. Copy dict.json file to read-up repo:
$ cd read-up
$ mkdir tf-pretrained-model
$ cp guesswhat/data/dict.json read-up/tf-pretrained-model/

Dataset for Oracle models

1. Dataset for baseline Oracle + Faster-RCNN visual features.

Under vqa-maskrcnn-benchmark/data/, download RCNN model and COCO images:

# download RCNN model
$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/detectron_model.pth
$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/detectron_config.yaml

# download COCO data  
$ wget http://images.cocodataset.org/zips/train2014.zip
$ wget http://images.cocodataset.org/zips/val2014.zip
$ wget http://images.cocodataset.org/zips/test2014.zip
$ unzip -j train2014.zip 
$ unzip -j valid2014.zip 
$ unzip -j test2014.zip 

Copy the guesswhat.train/valid/test.jsonl to vqa-maskrcnn-benchmark/data/. Unzip the COCO images into a folder image_dir/COCO_2014/images/, and prepare a npy file for feature extraction later.


$ python bin/prepare_extract_gt_features_gw.py \
    --src vqa-maskrcnn-benchmark/data/guesswhat.train.jsonl \
    --img-dir vqa-maskrcnn-benchmark/image_dir/COCO_2014/images/ \
    --out vqa-maskrcnn-benchmark/image_dir/COCO_2014/npy_files/guesswhat.train.npy

Repeat the same process for val and test. The generated file looks like the following:

{
    {
        'file_name': 'name_of_image_file',
        'file_path': '<path_to_image_file_on_your_disk>',
        'bbox': array([
                        [ x1, y1, width1, height1],
                        [ x2, y2, width2, height2],
                        ...
                    ]),
        'num_box': 2
    },
    ....
}

Extract features from the ground-truth bounding boxes generated before:

$ python bin/extract_features_from_gt.py \
    --model_file vqa-maskrcnn-benchmark/data/detectron_model.pth \
    --config_file vqa-maskrcnn-benchmark/data/detectron_config.yaml \
    --imdb_gt_file vqa-maskrcnn-benchmark/image_dir/COCO_2014/npy_files/guesswhat.train.npy \
    --output_folder data/rcnn/from_gt_gw_xyxy_scale/train

Repeat this process for val and test data.

2. Dataset for our Oracle model.

Download the pretrained VilBERT model (both vanilla and 12-in-1 have similar performance in our experiments).

# download vanilla pretrained model
$ cd vilbert-pretrained-model
$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/pretrained_model.bin

# download 12-in-1 pretrained model
$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/multi_task_model.bin

Download the features for COCO:

$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/datasets/coco/features_100/COCO_trainval_resnext152_faster_rcnn_genome.lmdb/data.mdb && mv data.mdb COCO_trainval_resnext152_faster_rcnn_genome.lmdb/
$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/datasets/coco/features_100/COCO_test_resnext152_faster_rcnn_genome.lmdb/data.mdb && mv data.mdb COCO_test_resnext152_faster_rcnn_genome.lmdb/

Dataset for Q-Gen models

1. Dataset for baseline Q-Gen model [1]

$ wget www.florian-strub.com/github/ft_vgg_img.zip
$ unzip ft_vgg_img.zip -d img/

2. Dataset for VDST Q-Gen model [2]

$ python bin/extract_features.py \
    --model_file vqa-maskrcnn-benchmark/data/detectron_model.pth \
    --config_file vqa-maskrcnn-benchmark/data/detectron_config.yaml \
    --image_dir vqa-maskrcnn-benchmark/image_dir/COCO_2014/images/ \
    --output_folder data/rcnn/from_rcnn/ \
    --batch_size 8

Dataset for Guesser models

1. Dataset for baseline Guesser model[1]

$ cd data/vilbert-multi-task
$ wget https://dl.fbaipublicfiles.com/vilbert-multi-task/datasets.tar.gz
$ tar -I pigz -xvf datasets.tar.gz datasets/guesswhat/

Model Training & Evaluation

Oracle

To train our Oracle model:

$ python -m torch.distributed.launch \
    --nproc_per_node=4 \
    --nnodes=1 \
    --node_rank=0 \
    main.py \
    --command train-oracle-vilbert \
    --config config_files/oracle_vilbert.yaml \
    --n-jobs 8

To evaluate our Oracle model:

$ python main.py \
    --command test-oracle-vilbert \
    --config config_files/oracle_vilbert.yaml \
    --load ckpt/oracle_vilbert-sd0/epoch-3.pth

This repo also implements other Oracle models:

  • Baseline Oracle model [1]
  • Baseline Oracle model + Faster-RCNN visual features (our ablation model)

To train and evaluate this model, run the main.py with corresponding config file and command.

Guesser

To train our Guesser model:

$ python main.py \
    --command train-guesser-vilbert \
    --config config_files/guesser_vilbert.yaml \
    --n-jobs 8

To evaluate our Guesser model:

$ python main.py \
    --command test-guesser-vilbert \
    --config config_files/guesser_vilbert.yaml \
    --n-jobs 8 \
    --load ckpt/guesser_vilbert-sd0/best.pth

This repo also implements other Guesser models:

  • Baseline Guesser model [1]

To train and evaluate this model, run the main.py with corresponding config file and command.

Q-Gen

To train our Q-Gen model:

# Distributed training
$ python -m torch.distributed.launch \
    --nproc_per_node=4 \
    --nnodes=1 \
    --node_rank=0 \
    main.py \
    --command train-qgen-vilbert \
    --config config_files/qgen_vilbert.yaml \
    --n-jobs 8 

# Non-distributed training
$ python main.py \
    --command train-qgen-vilbert \
    --config config_files/qgen_vilbert.yaml \
    --n-jobs 8 

To evalaute our Q-Gen model:

$ python main.py \
    --command test-self-play-all-vilbert \
    --config config_files/self_play_all_vilbert.yaml \
    --n-jobs 8

This repo also implements other Q-Gen models:

  • Baseline Q-Gen model [1]
  • VDST Q-Gen model [2]

To train and evaluate these models, run the main.py with corresponding config file and command.

References

[1] Strub, F., De Vries, H., Mary, J., Piot, B., Courvile, A., & Pietquin, O. (2017, August). End-to-end optimization of goal-driven and visually grounded dialogue systems. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 2765-2771).

[2] Pang, W., & Wang, X. (2020, April). Visual dialogue state tracking for question generation. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 07, pp. 11831-11838).

Fortuitous Forgetting in Connectionist Networks

Fortuitous Forgetting in Connectionist Networks Introduction This repository includes reference code for the paper Fortuitous Forgetting in Connection

Hattie Zhou 14 Nov 26, 2022
DC3: A Learning Method for Optimization with Hard Constraints

DC3: A learning method for optimization with hard constraints This repository is by Priya L. Donti, David Rolnick, and J. Zico Kolter and contains the

CMU Locus Lab 57 Dec 26, 2022
基于DouZero定制AI实战欢乐斗地主

DouZero_For_Happy_DouDiZhu: 将DouZero用于欢乐斗地主实战 本项目基于DouZero 环境配置请移步项目DouZero 模型默认为WP,更换模型请修改start.py中的模型路径 运行main.py即可 SL (baselines/sl/): 基于人类数据进行深度学习

1.5k Jan 08, 2023
InsTrim: Lightweight Instrumentation for Coverage-guided Fuzzing

InsTrim The paper: InsTrim: Lightweight Instrumentation for Coverage-guided Fuzzing Build Prerequisite llvm-8.0-dev clang-8.0 cmake = 3.2 Make git cl

75 Dec 23, 2022
This repository contains code released by Google Research.

This repository contains code released by Google Research.

Google Research 26.6k Dec 31, 2022
Simple Linear 2nd ODE Solver GUI - A 2nd constant coefficient linear ODE solver with simple GUI using euler's method

Simple_Linear_2nd_ODE_Solver_GUI Description It is a 2nd constant coefficient li

:) 4 Feb 05, 2022
DCGAN-tensorflow - A tensorflow implementation of Deep Convolutional Generative Adversarial Networks

DCGAN in Tensorflow Tensorflow implementation of Deep Convolutional Generative Adversarial Networks which is a stabilize Generative Adversarial Networ

Taehoon Kim 7.1k Dec 29, 2022
TalkingHead-1KH is a talking-head dataset consisting of YouTube videos

TalkingHead-1KH Dataset TalkingHead-1KH is a talking-head dataset consisting of YouTube videos, originally created as a benchmark for face-vid2vid: On

173 Dec 29, 2022
🗺 General purpose U-Network implemented in Keras for image segmentation

TF-Unet General purpose U-Network implemented in Keras for image segmentation Getting started • Training • Evaluation Getting started Looking for Jupy

Or Fleisher 2 Aug 31, 2022
Signals-backend - A suite of card games written in Python

Card game A suite of card games written in the Python language. Features coming

1 Feb 15, 2022
Code basis for the paper "Camera Condition Monitoring and Readjustment by means of Noise and Blur" (2021)

Camera Condition Monitoring and Readjustment by means of Noise and Blur This repository contains the source code of the paper: Wischow, M., Gallego, G

7 Dec 22, 2022
MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images

MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images This repository contains the implementation of our paper MetaAvatar: Learni

sfwang 96 Dec 13, 2022
No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

This repository contains the implementation for the paper: No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consiste

Alireza Golestaneh 75 Dec 30, 2022
Official repository for: Continuous Control With Ensemble DeepDeterministic Policy Gradients

Continuous Control With Ensemble Deep Deterministic Policy Gradients This repository is the official implementation of Continuous Control With Ensembl

4 Dec 06, 2021
Continual World is a benchmark for continual reinforcement learning

Continual World Continual World is a benchmark for continual reinforcement learning. It contains realistic robotic tasks which come from MetaWorld. Th

41 Dec 24, 2022
Implementation of QuickDraw - an online game developed by Google, combined with AirGesture - a simple gesture recognition application

QuickDraw - AirGesture Introduction Here is my python source code for QuickDraw - an online game developed by google, combined with AirGesture - a sim

Viet Nguyen 89 Dec 18, 2022
MATLAB codes of the book "Digital Image Processing Fourth Edition" converted to Python

Digital Image Processing Python MATLAB codes of the book "Digital Image Processing Fourth Edition" converted to Python TO-DO: Refactor scripts, curren

Merve Noyan 24 Oct 16, 2022
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals, CVPR2021

End-to-End Object Detection with Learnable Proposal, CVPR2021

Peize Sun 1.2k Dec 27, 2022
This repository contains implementations and illustrative code to accompany DeepMind publications

DeepMind Research This repository contains implementations and illustrative code to accompany DeepMind publications. Along with publishing papers to a

DeepMind 11.3k Dec 31, 2022
Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

Distributed Evolutionary Algorithms in Python 4.9k Jan 05, 2023