PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Overview

Transparency-by-Design networks (TbD-nets)

Binder Python version support PyTorch version support

This repository contains code for replicating the experiments and visualizations from the paper

Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning

David Mascharka, Philip Tran, Ryan Soklaski, Arjun Majumdar

The paper describes Transparency-by-Design networks (TbD-nets), which are built around a visual attention mechanism. This repository contains the model architecture put forward in the paper and code that will allow you to

A visualization of the output produced by our TbD-net model can be seen below.

If you find this code useful in your research, please cite

@InProceedings{Mascharka_2018_CVPR,
author = {Mascharka, David and Tran, Philip and Soklaski, Ryan and Majumdar, Arjun},
title = {Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
} 

Full VQA

To ask a natural-language question and provide an image to the model and get an answer and reasoning chain back, see the full VQA example notebook. This will define all the machinery you need to perform the full VQA task and will allow you to download the necessary models. Try it with Binder!

Recreating Our Visualizations

You can use Binder to use our model without any setup!

To reproduce our work on your local machine, you'll need to clone this repository and set up PyTorch. We also recommend using CUDA and cuDNN if you have a GPU available.

You can then open up the visualize-output notebook. That will walk you through running our model and generates all the figures we use in our paper. It will also download one of our pretrained models. From there, you can play around with the images we provide without having to download any outside data or models. If you would like to experiment with our other models, see the downloading models section.

Training a Model

To train a model from scratch, there are a few requirements to take care of. We assume you have already set up PyTorch and CUDA/cuDNN if you plan on using a GPU (which is highly recommended).

1. Getting data

The CLEVR dataset is available at its project page. The first step for training is to download that data.

You will also need to extract features and process the question files to produce programs before training a model. The instructions here provide a method for this. We recommend cloning that repository and following those instructions.

NOTE: to extract 28x28 features, you will need to add the --model_stage 2 option to the extract_features.py command. Following the conventions on that page, the command you want is:

python scripts/extract_features.py \
    --input_image_dir data/CLEVR_v1.0/images/train \
    --output_h5_file data/train_features.h5 \
    --model_stage 2

If you want to train on the 14x14 feature maps, you can follow Justin's instructions exactly.

After you have finished the above, you will have several HDF5 files containing the image features and questions, and a vocabulary file. While we do provide a DataLoader that will work with the HDF5 files, we personally find NumPy npy files more robust and generally more pleasant to work with, so we default to using those.

a. Converting HDF5 to npy

Note that this step is completely optional. The h5_to_np script will produce npy files from your HDF5 files.

Note that the largest NumPy data file (train_features.npy) is 53 gigabytes for the 14x14 feature maps or 105 gigabytes for the 28x28 feature maps, meaning you will need a substantial amount of RAM available on your machine to create these files. If you do not have enough memory available, use the HDF5 data loader instead of trying to convert these files.

To convert your HDF5 files to npy files, invoke one of the following, depending on whether you want to convert images to NumPy format as well:

python h5_to_np -q /path/to/questions.h5 -f /path/to/features.h5 -i /path/to/images.h5 -d /path/to/dest/
python h5_to_np -q /path/to/questions.h5 -f /path/to/features.h5 -d /path/to/destination/

2. Training the model

The train-model notebook will then walk through the training process. Our recommended directory structure (which is shown in the notebook) is to create a symlink to your data files inside the data folder. This can be done via:

ln -s /path/to/the/data/train_questions.npy data/training/
ln -s /path/to/the/data/train_image_idxs.npy data/training/
# etc

for data in npy format, or via:

ln -s /path/to/the/data/train_features.h5 data/training/
ln -s /path/to/the/data/train_questions.h5 data/training/
# likewise for validation

for data in HDF5 format.

If you prefer a different directory structure, update the data loader paths in the notebook. The notebook will walk through training a model from this point.

Testing a Model

Note that the testing data does not provide ground truth programs, so we will need to generate programs from the questions for testing. We do not focus on this component of the network in our work, so we reuse the program generator from Johnson et al. We have repackaged the sequence-to-sequence model they use for this, removing unnecessary functionality and updating the code to run on PyTorch versions later than 0.1. We provide a model checkpoint that we trained ourselves, so you don't need to download and use their model. The test-eval notebook will walk through the process to produce a file containing the predicted test answers.

Notes

Downloading Models

To download models, you can use the download_pretrained_models.py script, or download them programmatically as we do in the visualize output and full VQA notebooks.

There are several pretrained models available. If you would like to play with a specific model from the table of results in the paper, you certainly can. However, we only provide extracted features for the model trained on 28x28 feature maps, so if you want to use the 14x14 feature maps you'll need to extract those features yourself. See the getting data section for details on that. The download options for the script are:

python utils/download_pretrained_models.py -m original
python utils/download_pretrained_models.py -m reg
python utils/download_pretrained_models.py -m hres
python utils/download_pretrained_models.py -m all

The default is hres which downloads only the models trained with higher-resolution 28x28 feature maps and the regularization factor (see paper text for details). This results in cleaner looking attention masks, state-of-the-art performance, and is recommended. If you want to replicate the other results in the table, original will give only the models trained without regularization on 14x14 feature maps, reg will download the models trained with regularization on 14x14 feature maps, and all will download everything.

Python

We only recommend running the code with Python 3, having done all our development using Python 3.6. While the code may be coerced into running in Python 2, we will not support Python 2, so please do not open issues that are related to Python 2 support.

PyTorch

Our development was done using PyTorch v0.1.12, v0.2.0, and v0.3.0 and has been tested with v0.4. As such, our code should run even on PyTorch versions earlier than 0.2 without modifications. However, we do recommend running on PyTorch 0.2.0 or later. For setting up PyTorch, see the official installation instructions. The specific hash that the original model from our paper was developed from is here.

To use PyTorch <0.4, clone the repository and check out tags/torch0.3. For PyTorch 0.4 and above, master will run.

CUDA/cuDNN

Our code is tested under CUDA 8 and CUDA 9 with cuDNN 5 and cuDNN 7, respectively. For setting up CUDA, see the NVIDIA documentation. We recommend using cuDNN, which is also available from NVIDIA.

Operating Systems

Our development was done on CentOS 6 and Ubuntu 16.04. The code has also been tested under Arch Linux.

Setting up a conda environment

If you like, you can use the environment.yml configuration to set up a development environment if you use conda. This is the environment that Binder uses to give a live notebook for the visualizations. To create an environment using this, run

conda env create -f environment.yml

The environment can then be activated with source activate tbd-env.

Copyright

DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited.

This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Assistant Secretary of Defense for Research and Engineering.

© 2017 Massachusetts Institute of Technology.

MIT Proprietary, Subject to FAR52.227-11 Patent Rights - Ownership by the contractor (May 2014)

The software/firmware is provided to you on an As-Is basis

Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.

Comments
  • tensor matches error

    tensor matches error

    My eval.py file copies from test-eval.ipynb

    import torch
    
    from pathlib import Path
    import numpy as np
    import h5py
    
    from tbd.module_net import load_tbd_net
    from utils.clevr import load_vocab
    from utils.generate_programs import load_program_generator, generate_programs
    
    
    vocab_path = Path('data/vocab.json')
    model_path = Path('models/clevr-reg-hres.pt')
    tbd_net = load_tbd_net(model_path, load_vocab(vocab_path))
    
    
    program_generator = load_program_generator(Path('models/program_generator.pt'))
    generate_programs(Path('data/val_questions.h5'), program_generator, 
                      dest_dir=Path('data/val/'), batch_size=128)
    
    
    use_np_features = False
    if use_np_features:
        features = np.load(str(Path('data/val/val_features.npy')), mmap_mode='r')
    else:
        features = h5py.File(Path('data/val_features.h5'))['features']
    
    question_np = np.load(Path('data/val/questions.npy'))
    image_idx_np = np.load(Path('data/val/image_idxs.npy'))
    programs_np = np.load(Path('data/val/programs.npy'))
    
    
    answers = ['blue', 'brown', 'cyan', 'gray', 'green', 'purple', 'red', 'yellow',
               'cube', 'cylinder', 'sphere',
               'large', 'small',
               'metal', 'rubber',
               'no', 'yes',
               '0', '1', '10', '2', '3', '4', '5', '6', '7', '8', '9']
    
    pred_idx_to_token = dict(zip(range(len(answers)), answers))
    
    
    f = open('predicted_answers.txt', 'w')
    def write_preds(preds):
        for pred in preds:
            f.write(pred)
            f.write('\n')
    
    
    
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    
    
    
    batch_size = 128
    for batch in range(0, len(programs_np), batch_size):
        image_idx = image_idx_np[batch:batch+batch_size]
        programs = torch.LongTensor(programs_np[batch:batch+batch_size]).to(device)
        
        if use_np_features:
            feats = torch.FloatTensor(np.asarray(features[image_idx])).to(device)
        else:
            # Using HDF5 files requires some overhead due to constraints on how those may
            # be accessed. We cannot index into the file using a numpy array. We also cannot 
            # access the same element multiple times (e.g. we cannot index into an h5py.File 
            # with [1,1,1]) because we are constrained to increasing sequences
            feats = []
            for idx in image_idx:
                feats.append(np.asarray(features[idx]))
            feats = torch.FloatTensor(np.asarray(feats)).to(device)
    
        outputs = tbd_net(feats, programs)
        _, preds = outputs.max(1)
        preds = [pred_idx_to_token[pred] for pred in preds.detach().to('cpu').numpy()]
        write_preds(preds)
    f.close()
    

    and error as

    Traceback (most recent call last):
      File "eval.py", line 72, in <module>
        outputs = tbd_net(feats, programs)
      File "/home/dengwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/dengwei/tbd-nets/tbd/module_net.py", line 195, in forward
        output = module(feat_input, output)
      File "/home/dengwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/dengwei/tbd-nets/tbd/modules.py", line 92, in forward
        attended_feats = torch.mul(feats, attn.repeat(1, self.dim, 1, 1))
    RuntimeError: The size of tensor a (128) must match the size of tensor b (16384) at non-singleton dimension 1
    

    maybe I should use NUMPY file rather HDF5 file?I extract feature from this master.

    bug 
    opened by bidongqinxian 10
  • evaluate error on val

    evaluate error on val

    Hello, when I evaluate on val datasets, the error appears, so what's wrong?

    Traceback (most recent call last):
      File "eval.py", line 17, in <module>
        dest_dir=Path('/data'), batch_size=128)
      File "/home/dengwei/tbd-nets/utils/generate_programs.py", line 256, in generate_programs
        programs_pred = program_generator.reinforce_sample(questions_var)
      File "/home/dengwei/tbd-nets/utils/generate_programs.py", line 121, in reinforce_sample
        encoded = self.encoder(x)
      File "/home/dengwei/tbd-nets/utils/generate_programs.py", line 91, in encoder
        embed = self.encoder_embed(x)
      File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in __call__
        result = self.forward(*input, **kwargs)
      File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/sparse.py", line 103, in forward
        self.scale_grad_by_freq, self.sparse
    RuntimeError: save_for_backward can only save input or output tensors, but argument 0 doesn't satisfy this condition
    

    and this is the place I changed in test-eval:

    vocab_path = Path('data/vocab.json')
    model_path = Path('models/clevr-reg-hres.pt')
    tbd_net = load_tbd_net(model_path, load_vocab(vocab_path))
    
    program_generator = load_program_generator(Path('models/program_generator.pt'))
    generate_programs(Path('data/val_questions.h5'), program_generator, 
                      dest_dir=Path('/data'), batch_size=128)
                      
    use_np_features = False
    if use_np_features:
        features = np.load(str(Path('data/test/test_features.npy')), mmap_mode='r')
    else:
        features = h5py.File(Path('data/val_features.h5'))['features']
    
    question_np = np.load(Path('data/val_questions.npy'))
    image_idx_np = np.load(Path('data/val_image_idxs.npy'))
    programs_np = np.load(Path('data/val_programs.npy'))
    
    opened by bidongqinxian 8
  • The environment setting

    The environment setting

    I find that I am stuck with the environment settings. My system is Ubuntu 16.04 ,NVIDIA driver 384.111 cuda9.1 and GTX 1080 ti. But the error with step 2 is "Cuda runtime error(25):CUDA driver version is insufficient for CUDA runtime version". With the NVIDIA driver up to 387.26 or 390.42, Ubuntu cannot identity the NVIDIA driver. Nevertheless with CUDA version down to 8, the other ImportError libcudart.so.9.1: cannot open shared object files. So may I ask what's the environment setting appropriate for the recreation?

    opened by darkmir 6
  • No longer runs on mybinder.org

    No longer runs on mybinder.org

    Hey,

    I was wondering if you tried running this lately and had an ideas as to why it doesn't run successfully anymore on mybinder.org. I don't really know much about the code in the repo nor pytorch. From looking at the environment.yml and the errors I get my guess would be that there is now a newer version of pytorch that changed conventions or some such?

    I've used this repository before as an example in talks about Binder and wanted to do so again but during my run through I noticed that it doesn't work anymore. If you don't have time to fix this that is totally fine, I'll find a different repo for demo purposes.

    opened by betatim 4
  • How to evaluate test results?

    How to evaluate test results?

    Hi, After getting predicted answers for test data, how can I evaluate results? Since there are different setting in your paper (e.g., Count, Compare, Exist, and so on), could u have code snippet to conveniently achieve this? Thanks

    opened by lwye 4
  • RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

    RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

    Trying to reproduce the experiments on train-model.ipynb and using the proposed enviroment with pytorch 0.4.1 the code produced the following error:

    RuntimeError                              Traceback (most recent call last)
    <ipython-input-14-82ec354902a5> in <module>()
          6     epoch += 1
          7     print('starting epoch', epoch)
    ----> 8     train_epoch()
          9 
         10 save_checkpoint(epoch, 'example-{:02d}.pt'.format(epoch))
    
    <ipython-input-13-2216c33e0bef> in train_epoch()
         33 
         34         loss_file.write('Loss: {}\n'.format(loss.item()))
    ---> 35         loss.backward()
         36         optimizer.step()
         37         break
    
    ~/anaconda2/envs/tbd-env/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
         91                 products. Defaults to ``False``.
         92         """
    ---> 93         torch.autograd.backward(self, gradient, retain_graph, create_graph)
         94 
         95     def register_hook(self, hook):
    
    ~/anaconda2/envs/tbd-env/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
         88     Variable._execution_engine.run_backward(
         89         tensors, grad_tensors, retain_graph, create_graph,
    ---> 90         allow_unreachable=True)  # allow_unreachable flag
         91 
         92 
    
    RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
    

    Pytorch is trying to backpropagate through a tensor with no grad_fn, but I wasn't able to find the problem yet.

    bug awaiting response 
    opened by mauricioarmani 3
  • can not find file  scripts/extract_features.py

    can not find file scripts/extract_features.py

    Excuse me,Thanks for your great work.when I run this code ,It have a little question.

    "python scripts/extract_features.py
    --input_image_dir </path/to/CLEVR/images/train>
    --output_h5_file </path/to/train_features.h5>
    --model_stage 2"

    I can not find file scripts/extract_features.py could you help me?

    opened by JackWhite-rwx 2
  • Efficiency question about the model

    Efficiency question about the model

    Hey, I didn't run the code yet. But I noticed the code module_net.py process questions in a batch one by one, the batch only share the same stem and classifier module. Although this design is quite reasonable since different questions need different modules, I still worry about the efficiency of the training pharse. What's your setup while training (GPU number, batchsize, training time, etc..)? Do you have some advices on accelerating this? Thanks!

    question 
    opened by zhangyuygss 2
  • Properties not specified in modules?

    Properties not specified in modules?

    Hey, I've read the codes for different modules. It seems that the modules does not contain any design for encoding properties (e.g. red or blue for color property). Take attention module for example, if we're not sure what color we are attending, how can the module attend to the right locations? Please correct me if I missed something, thanks!

    question 
    opened by zhangyuygss 1
  • Use PIL for image resizing

    Use PIL for image resizing

    Hello - this PR is related to #14

    Notes:

    1. I decided to go with PIL for this change, since it looks like the interpolate() function from PyTorch doesn't support Lanczos interpolations yet.
    2. On the same token, the Image.resize() function from Pillow doesn't support cubic interpolations. For now, I just left out cubic as an option, but I am wondering when someone might actually want to use it in 2D image processing. What do you think?
    3. Next, I expanded the docstrings in the display_tree() and display_helper() functions (found in full-vqa-example.ipynb and visualize-output.ipynb respectively) to also allow users to pass in ‘box’ or ‘hamming’ for the interp parameter.
    4. Finally, I added a .gitignore file to the repo, mainly to avoid pushing my local copy of the clevr-reg-hres.pt binary file.

    Looking forward to hear what others think of these changes!

    opened by UPstartDeveloper 0
Releases(torch0.3)
Owner
David Mascharka
Computer vision researcher
David Mascharka
这是一个利用facenet和retinaface实现人脸识别的库,可以进行在线的人脸识别。

Facenet+Retinaface:人脸识别模型在Pytorch当中的实现 目录 注意事项 Attention 所需环境 Environment 文件下载 Download 预测步骤 How2predict 参考资料 Reference 注意事项 该库中包含了两个网络,分别是retinaface和

Bubbliiiing 102 Dec 30, 2022
Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles This project is for the paper: Detecting Errors and Estimating

Jiefeng Chen 13 Nov 21, 2022
Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

StrengthNet Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis" https://arxiv.org/abs/2110

RuiLiu 65 Dec 20, 2022
DeepOBS: A Deep Learning Optimizer Benchmark Suite

DeepOBS - A Deep Learning Optimizer Benchmark Suite DeepOBS is a benchmarking suite that drastically simplifies, automates and improves the evaluation

Aaron Bahde 7 May 12, 2020
Reverse engineering Rosetta 2 in M1 Mac

Project Champollion About this project Rosetta 2 is an emulation mechanism to run the x86_64 applications on Arm-based Apple Silicon with Ahead-Of-Tim

FFRI Security, Inc. 258 Jan 07, 2023
QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

Introduction QRec is a Python framework for recommender systems (Supported by Python 3.7.4 and Tensorflow 1.14+) in which a number of influential and

Yu 1.4k Dec 30, 2022
Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object

151 Dec 26, 2022
Synthetic Humans for Action Recognition, IJCV 2021

SURREACT: Synthetic Humans for Action Recognition from Unseen Viewpoints Gül Varol, Ivan Laptev and Cordelia Schmid, Andrew Zisserman, Synthetic Human

Gul Varol 59 Dec 14, 2022
Official implementation of Monocular Quasi-Dense 3D Object Tracking

Monocular Quasi-Dense 3D Object Tracking Monocular Quasi-Dense 3D Object Tracking (QD-3DT) is an online framework detects and tracks objects in 3D usi

Visual Intelligence and Systems Group 441 Dec 20, 2022
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Created by Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas from Sta

Charles R. Qi 4k Dec 30, 2022
Cossim - Sharpened Cosine Distance implementation in PyTorch

Sharpened Cosine Distance PyTorch implementation of the Sharpened Cosine Distanc

Istvan Fehervari 10 Mar 22, 2022
The code for SAG-DTA: Prediction of Drug–Target Affinity Using Self-Attention Graph Network.

SAG-DTA The code is the implementation for the paper 'SAG-DTA: Prediction of Drug–Target Affinity Using Self-Attention Graph Network'. Requirements py

Shugang Zhang 7 Aug 02, 2022
2021 CCF BDCI 全国信息检索挑战杯(CCIR-Cup)智能人机交互自然语言理解赛道第二名参赛解决方案

2021 CCF BDCI 全国信息检索挑战杯(CCIR-Cup) 智能人机交互自然语言理解赛道第二名解决方案 比赛网址: CCIR-Cup-智能人机交互自然语言理解 1.依赖环境: python==3.8 torch==1.7.1+cu110 numpy==1.19.2 transformers=

JinXiang 22 Oct 29, 2022
Position detection system of mobile robot in the warehouse enviroment

Autonomous-Forklift-System About | GUI | Tests | Starting | License | Author | 🎯 About An application that run the autonomous forklift paletization a

Kamil Goś 1 Nov 24, 2021
FB-tCNN for SSVEP Recognition

FB-tCNN for SSVEP Recognition Here are the codes of the tCNN and FB-tCNN in the paper "Filter Bank Convolutional Neural Network for Short Time-Window

Wenlong Ding 12 Dec 14, 2022
Blind Video Temporal Consistency via Deep Video Prior

deep-video-prior (DVP) Code for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior PyTorch implementation | paper | project web

Chenyang LEI 272 Dec 21, 2022
Action Recognition for Self-Driving Cars

Action Recognition for Self-Driving Cars This repo contains the codes for the 2021 Fall semester project "Action Recognition for Self-Driving Cars" at

VITA lab at EPFL 3 Apr 07, 2022
Source code for EquiDock: Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking (ICLR 2022)

Source code for EquiDock: Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking (ICLR 2022) Please cite "Independent SE(3)-Equivar

Octavian Ganea 154 Jan 02, 2023
A synthetic texture-invariant dataset for object detection of UAVs

A synthetic dataset for object detection of UAVs This repository contains a synthetic datasets accompanying the paper Sim2Air - Synthetic aerial datas

LARICS Lab 10 Aug 13, 2022
PyTorch deep learning projects made easy.

PyTorch Template Project PyTorch deep learning project made easy. PyTorch Template Project Requirements Features Folder Structure Usage Config file fo

Victor Huang 3.8k Jan 01, 2023