PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)

Related tags

Deep LearningD-VQA
Overview

D-VQA

We provide the PyTorch implementation for Debiased Visual Question Answering from Feature and Sample Perspectives (NeurIPS 2021).

D-VQA

Dependencies

  • Python 3.6
  • PyTorch 1.1.0
  • dependencies in requirements.txt
  • We train and evaluate all of the models based on one TITAN Xp GPU

Getting Started

Installation

  1. Clone this repository:

     git clone https://github.com/Zhiquan-Wen/D-VQA.git
     cd D-VQA
    
  2. Install PyTorch and other dependencies:

     pip install -r requirements.txt
    

Download and preprocess the data

cd data 
bash download.sh
python preprocess_features.py --input_tsv_folder xxx.tsv --output_h5 xxx.h5
python feature_preprocess.py --input_h5 xxx.h5 --output_path trainval 
python create_dictionary.py --dataroot vqacp2/
python preprocess_text.py --dataroot vqacp2/ --version v2
cd ..

Training

  • Train our model
CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --output saved_models_cp2/ --self_loss_weight 3 --self_loss_q 0.7
  • Train the model with 80% of the original training set
CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --output saved_models_cp2/ --self_loss_weight 3 --self_loss_q 0.7 --ratio 0.8 

Evaluation

  • A json file of results from the test set can be produced with:
CUDA_VISIBLE_DEVICES=0 python test.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --checkpoint_path saved_models_cp2/best_model.pth --output saved_models_cp2/result/
  • Compute detailed accuracy for each answer type:
python comput_score.py --input saved_models_cp2/result/XX.json --dataroot data/vqacp2/

Pretrained model

A well-trained model can be found here. The test results file produced by it can be found here and its performance is as follows:

Overall score: 61.91
Yes/No: 88.93 Num: 52.32 other: 50.39

Reference

If you found this code is useful, please cite the following paper:

@inproceedings{D-VQA,
  title     = {Debiased Visual Question Answering from Feature and Sample Perspectives},
  author    = {Zhiquan Wen, 
               Guanghui Xu, 
               Mingkui Tan, 
               Qingyao Wu, 
               Qi Wu},
  booktitle = {NeurIPS},
  year = {2021}
}

Acknowledgements

This repository contains code modified from SSL-VQA, thank you very much!

Besides, we thank Yaofo Chen for providing MIO library to accelerate the data loading.

Comments
  • Questions about the code

    Questions about the code

    Thank you very much for providing the code, but I still have two questions that I did not understand well.

    1. A module, BDM, is used to capture negative bias, but this module only includes a multi-layer perceptron. Then how to ensure the features captured by this multi-layer perceptron are negative bias?
    2. On the left of Figure 2 of the paper, there are no backward gradient of the question-to-answer and the vision-to-answer branches. Where did it reflect in the code?
    opened by darwann 4
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • LXMERT numbers

    LXMERT numbers

    Hi, I wish to reproduce the LXMERT(LXMERT without D-VQA) numbers reported in the paper. It would be helpful if you could provide me with a way to do this using your code. I tried using the original LXMERT code, but I am not able to get the numbers reported in your paper on the VQA-CP2 dataset.

    opened by Vaidehi99 0
  • Download trainval_36.zip error

    Download trainval_36.zip error

    Hi, thank you for your work on this.

    I keep getting a download error when downloading the trainval_36.zip file. Is there another link I can use to download this?

    Thanks in advance!

    opened by chojw 0
  • 关于box和image的对齐问题

    关于box和image的对齐问题

    您好,我将box的注释解开后,重新生成特征,然后将其绘制出来,但是明显感觉有偏差,不知道您是否可以提供一份绘图的代码。 image 下面是我的代码 def plot_rect(image, boxes): img = Image.fromarray(np.uint8(image)) draw = ImageDraw.Draw(img) for k in range(2): box = boxes[k,:] print(box) drawrect(draw, box, outline='green', width=3) img = np.asarray(img) return img def drawrect(drawcontext, xy, outline=None, width=0): x1, y1, x2, y2 = xy points = (x1, y1), (x2, y1), (x2, y2), (x1, y2), (x1, y1) drawcontext.line(points, fill=outline, width=width)

    opened by LemonQC 0
Owner
Zhiquan Wen
Zhiquan Wen
Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis

MOT Tracked object bounding box association (CenterTrack++) New association method based on CenterTrack. Two new branches (Tracked Size and IOU) are a

36 Oct 04, 2022
A big endian Gentoo port developed on a Pine64.org RockPro64

Gentoo-aarch64_be A big endian Gentoo port developed on a Pine64.org RockPro64 The endian wars are over... little endian won. As a result, it is incre

Rory Bolt 6 Dec 07, 2022
Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

Microsoft 282 Jan 09, 2023
Flax is a neural network ecosystem for JAX that is designed for flexibility.

Flax: A neural network library and ecosystem for JAX designed for flexibility Overview | Quick install | What does Flax look like? | Documentation See

Google 3.9k Jan 02, 2023
SegNet including indices pooling for Semantic Segmentation with tensorflow and keras

SegNet SegNet is a model of semantic segmentation based on Fully Comvolutional Network. This repository contains the implementation of learning and te

Yuta Kamikawa 172 Dec 23, 2022
OCR Post Correction for Endangered Language Texts

📌 Coming soon: an update to the software including features from our paper on semi-supervised OCR post-correction, to be published in the Transaction

Shruti Rijhwani 96 Dec 31, 2022
This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Patches Are All You Need? 🤷 This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?". Code ov

ICLR 2022 Author 934 Dec 30, 2022
PyTorch code accompanying our paper on Maximum Entropy Generators for Energy-Based Models

Maximum Entropy Generators for Energy-Based Models All experiments have tensorboard visualizations for samples / density / train curves etc. To run th

Rithesh Kumar 135 Oct 27, 2022
[CVPR 2022] Unsupervised Image-to-Image Translation with Generative Prior

GP-UNIT - Official PyTorch Implementation This repository provides the official PyTorch implementation for the following paper: Unsupervised Image-to-

Shuai Yang 125 Jan 03, 2023
[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

BNN - BN = ? Training Binary Neural Networks without Batch Normalization Codes for this paper BNN - BN = ? Training Binary Neural Networks without Bat

VITA 40 Dec 30, 2022
[ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction

REval Table of Contents Introduction Overview Requirements Installation Probing Usage Citation License 🎓 Introduction REval is a simple framework for

13 Jan 06, 2023
Reproduces ResNet-V3 with pytorch

ResNeXt.pytorch Reproduces ResNet-V3 (Aggregated Residual Transformations for Deep Neural Networks) with pytorch. Tried on pytorch 1.6 Trains on Cifar

Pau Rodriguez 481 Dec 23, 2022
An offline deep reinforcement learning library

d3rlpy: An offline deep reinforcement learning library d3rlpy is an offline deep reinforcement learning library for practitioners and researchers. imp

Takuma Seno 817 Jan 02, 2023
PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

Wenwen Yu 498 Dec 24, 2022
Source code for 2021 ICCV paper "In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces"

In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces This is the PyTorch implementation for 2021 ICCV paper "In-the-Wild Single C

27 Dec 06, 2022
Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

Amazasp Shaumyan 482 Jan 04, 2023
The codes reproduce the figures and statistics in the paper, "Controlling for multiple covariates," by Mark Tygert.

The accompanying codes reproduce all figures and statistics presented in "Controlling for multiple covariates" by Mark Tygert. This repository also pr

Meta Research 1 Dec 02, 2021
WRENCH: Weak supeRvision bENCHmark

🔧 What is it? Wrench is a benchmark platform containing diverse weak supervision tasks. It also provides a common and easy framework for development

Jieyu Zhang 176 Dec 28, 2022
Simple tutorials using Google's TensorFlow Framework

TensorFlow-Tutorials Introduction to deep learning based on Google's TensorFlow framework. These tutorials are direct ports of Newmu's Theano Tutorial

Nathan Lintz 6k Jan 06, 2023
A NSFW content filter.

Project_Nfilter A NSFW content filter. With a motive of minimizing the spreads and leakage of NSFW contents on internet and access to others devices ,

1 Jan 20, 2022