D-VQA

We provide the PyTorch implementation for Debiased Visual Question Answering from Feature and Sample Perspectives (NeurIPS 2021).

Dependencies

Python 3.6
PyTorch 1.1.0
dependencies in requirements.txt
We train and evaluate all of the models based on one TITAN Xp GPU

Getting Started

Installation

Clone this repository:

 git clone https://github.com/Zhiquan-Wen/D-VQA.git
 cd D-VQA

Install PyTorch and other dependencies:
```
 pip install -r requirements.txt
```

Download and preprocess the data

cd data 
bash download.sh
python preprocess_features.py --input_tsv_folder xxx.tsv --output_h5 xxx.h5
python feature_preprocess.py --input_h5 xxx.h5 --output_path trainval 
python create_dictionary.py --dataroot vqacp2/
python preprocess_text.py --dataroot vqacp2/ --version v2
cd ..

Training

Train our model

CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --output saved_models_cp2/ --self_loss_weight 3 --self_loss_q 0.7

Train the model with 80% of the original training set

CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --output saved_models_cp2/ --self_loss_weight 3 --self_loss_q 0.7 --ratio 0.8

Evaluation

A json file of results from the test set can be produced with:

CUDA_VISIBLE_DEVICES=0 python test.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --checkpoint_path saved_models_cp2/best_model.pth --output saved_models_cp2/result/

Compute detailed accuracy for each answer type:

python comput_score.py --input saved_models_cp2/result/XX.json --dataroot data/vqacp2/

Pretrained model

A well-trained model can be found here with raw training log. The test results file produced by it can be found here and its performance is as follows:

Overall score: 61.91
Yes/No: 88.93 Num: 52.32 other: 50.39

Quick Reproduce

Preparing enviroments: we prepare a docker image (built from Dockerfile) which has included above dependencies, you can pull this image from dockerhub or aliyun registry:

docker pull zhiquanwen/debias_vqa:v1

docker pull registry.cn-shenzhen.aliyuncs.com/wenzhiquan/debias_vqa:v1
docker tag registry.cn-shenzhen.aliyuncs.com/wenzhiquan/debias_vqa:v1 zhiquanwen/debias_vqa:v1

Start docker container: start the container by mapping the dataset in it:

docker run --gpus all -it --ipc=host --network=host --shm-size 32g -v /host/path/to/data:/xxx:ro zhiquanwen/debias_vqa:v1

Running: refer to Download and preprocess the data, Training and Evaluation steps in Getting Started.

Results: we obtain 61.73% in VQA-CP2 (which is almost the same as 61.91% in Table 1 of the paper) using the above docker image and training steps. We also provide the raw training log.

Reference

If you found this code is useful, please cite the following paper:

@inproceedings{D-VQA,
  title     = {Debiased Visual Question Answering from Feature and Sample Perspectives},
  author    = {Zhiquan Wen, 
               Guanghui Xu, 
               Mingkui Tan, 
               Qingyao Wu, 
               Qi Wu},
  booktitle = {NeurIPS},
  year = {2021}
}

Acknowledgements

This repository contains code modified from SSL-VQA, thank you very much!

Besides, we thank Yaofo Chen for providing MIO library to accelerate the data loading.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LXMERT		LXMERT
data		data
docker		docker
logs		logs
mio		mio
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SAN.py		SAN.py
SAN_and_DVQA.py		SAN_and_DVQA.py
UpDn.py		UpDn.py
UpDn_and_DVQA.py		UpDn_and_DVQA.py
attention.py		attention.py
classifier.py		classifier.py
comput_score.py		comput_score.py
dataset_vqacp.py		dataset_vqacp.py
fc.py		fc.py
framework.png		framework.png
language_model.py		language_model.py
main.py		main.py
opts.py		opts.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
utils.py		utils.py

License

Zhiquan-Wen/D-VQA

Folders and files

Latest commit

History

Repository files navigation

D-VQA

Dependencies

Getting Started

Installation

Download and preprocess the data

Training

Evaluation

Pretrained model

Quick Reproduce

Reference

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages