Code for "Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance" at NeurIPS 2021

Overview

Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance

Justin Lim, Christina X Ji, Michael Oberst, Saul Blecker, Leora Horwitz, and David Sontag. 2021. Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance. In Thirty-fifth Conference on Neural Information Processing Systems.

Individuals often make different decisions when faced with the same context, due to personal preferences and background. For instance, judges may vary in their leniency towards certain drug-related offenses, and doctors may vary in their preference for how to start treatment for certain types of patients. With these examples in mind, we present an algorithm for identifying types of contexts (e.g., types of cases or patients) with high inter-decision-maker disagreement. We formalize this as a causal inference problem, seeking a region where the assignment of decision-maker has a large causal effect on the decision. We give an iterative algorithm to find a region maximizing this objective and give a generalization bound for its performance. In a semi-synthetic experiment, we show that our algorithm recovers the correct region of disagreement accurately compared to baselines. Finally, we apply our algorithm to real-world healthcare datasets, recovering variation that aligns with existing clinical knowledge.

To run our algorithm, see run_semisynth_exp_recover_beta.ipynb for how to call IterativeRegionEstimator.py. The baselines and our model are also implemented in baselines.py. Helper functions (e.g. for evaluation) are in helpers.py.

Please refer to the following steps to reproduce the experiments and figures in this paper:

  1. To set-up the required packages, run create_env.sh, passing in a conda environment name. Then run source activate with the environment name to enter it.

  2. To run the semi-synthetic experiment,

    1. Download the criminal justice dataset from https://github.com/stanford-policylab/recidivism-predictions
    2. Process the data using data_processing/semisynth_process_data.ipynb.
    3. To run the iterative algorithm and baselines, run python3 run_baselines_on_semisynth.py with the product of the following arguments:
      1. type of model: Iterative, Direct, TarNet, ULearner, CausalForest
      2. number of agents: 2, 5, 10, 20, 40, 87 in our experiments
      3. subset: drug_possession, misdemeanor_under35
    4. Figures 1, 3, and 4 compare metrics for the methods. They can be produced by running plot_semisynth.ipynb.
    5. Figure 2 examines tuning the region size. run_semisynth_exp_recoverbeta.ipynb is a stand-alone notebook for reproducing it.
    6. Figures 5 and 6 examine convergence of the iterative algorithm. They can be produced by running plot_convergence.ipynb.
    7. Figures 7 and 8 examine how robust the iterative algorithm and direct baselines are to violations of the assumption that there are two agent groups. First, run python3 run_robustness_semisynth_experiment.py with the product of the following arguments:
      1. type of model: Iterative, Direct
      2. number of groups: 2, 3, 5, 10
      3. subset: drug_possession, misdemeanor_under35 Note that the number of agents is fixed at 40. The figures can then be produced by running plot_robustness.ipynb.
    8. Note: Helper code that is called to generate semi-synthetic data is located in semisynth_subsets.py, semisynth_dataloader.py, and semisynth_dataloader_robust.py.
  3. The real-world diabetes experiment uses proprietary data extracted using generate_t2dm_cohort.sql and first_line.sql.

    1. Select an outcome model from logistic regressions, decision trees, and random forests based on AUC, calibration, and partial dependence plots. Figure 9 and the statistics in Table 2 that guided our selection of a random forest outcome model are produced in select_outcome_model_for_diabetes_experiment.ipynb.
    2. The experiment is run with python3 run_baseline_models.py diabetes Iterative DecisionTree RandomForest. Figure 10b, the information needed to create Figures 10a, the statistics in Tables 1 and 3, and the fold consistency evaluation will be outputted.
    3. Note: Data loading helper functions, including how data is split, are located in real_data_loader.py. Most of the functions called to generate the output are located in realdata_analysis.py.
  4. The real-world Parkinson's experiment was run using open-access data.

    1. Download the data from https://www.ppmi-info.org/.
    2. Run python3 ppmi_feature_extraction.py passing in the directory containing the downloaded raw data and directory where processed data will be outputted.
    3. Manually process the treatment data to correct for typos in the drug name and treatment date
    4. Run process_parkinsons_data.ipynb to gather the data for the experiment.
    5. The experiment is run with python3 run_baseline_models.py ppmi Iterative DecisionTree. The information for creating Figure 11 and Table 4 are outputted.
Owner
Sontag Lab
Machine learning algorithms and applications to health care.
Sontag Lab
Image Matching Evaluation

Image Matching Evaluation (IME) IME provides to test any feature matching algorithm on datasets containing ground-truth homographies. Also, one can re

32 Nov 17, 2022
Dense Prediction Transformers

Vision Transformers for Dense Prediction This repository contains code and models for our paper: Vision Transformers for Dense Prediction René Ranftl,

Intel ISL (Intel Intelligent Systems Lab) 1.3k Dec 28, 2022
A Topic Modeling toolbox

Topik A Topic Modeling toolbox. Introduction The aim of topik is to provide a full suite and high-level interface for anyone interested in applying to

Anaconda, Inc. (formerly Continuum Analytics, Inc.) 93 Dec 01, 2022
f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation [Paper] [PyTorch] [MXNet] [Video] This repository provides code for training

Visual Understanding Lab @ Samsung AI Center Moscow 516 Dec 21, 2022
Self-supervised Augmentation Consistency for Adapting Semantic Segmentation (CVPR 2021)

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation This repository contains the official implementation of our paper: Self-su

Visual Inference Lab @TU Darmstadt 132 Dec 21, 2022
Learning to Initialize Neural Networks for Stable and Efficient Training

GradInit This repository hosts the code for experiments in the paper, GradInit: Learning to Initialize Neural Networks for Stable and Efficient Traini

Chen Zhu 124 Dec 30, 2022
text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

text recognition toolbox 1. 项目介绍 该项目是基于pytorch深度学习框架,以统一的改写方式实现了以下6篇经典的文字识别论文,论文的详情如下。该项目会持续进行更新,欢迎大家提出问题以及对代码进行贡献。 模型 论文标题 发表年份 模型方法划分 CRNN 《An End-t

168 Dec 24, 2022
Mixed Transformer UNet for Medical Image Segmentation

MT-UNet Update 2022/01/05 By another round of training based on previous weights, our model also achieved a better performance on ACDC (91.61% DSC). W

dotman 92 Dec 25, 2022
Machine Learning University: Accelerated Computer Vision Class

Machine Learning University: Accelerated Computer Vision Class This repository contains slides, notebooks, and datasets for the Machine Learning Unive

AWS Samples 1.3k Dec 28, 2022
This is the code of paper ``Contrastive Coding for Active Learning under Class Distribution Mismatch'' with python.

Contrastive Coding for Active Learning under Class Distribution Mismatch Official PyTorch implementation of ["Contrastive Coding for Active Learning u

21 Dec 22, 2022
The software associated with a paper accepted at EMNLP 2021 titled "Open Knowledge Graphs Canonicalization using Variational Autoencoders".

Open-KG-canonicalization The software associated with a paper accepted at EMNLP 2021 titled "Open Knowledge Graphs Canonicalization using Variational

International Business Machines 13 Nov 11, 2022
[ACM MM 2021] TSA-Net: Tube Self-Attention Network for Action Quality Assessment

Tube Self-Attention Network (TSA-Net) This repository contains the PyTorch implementation for paper TSA-Net: Tube Self-Attention Network for Action Qu

ShunliWang 18 Dec 23, 2022
Repositorio oficial del curso IIC2233 Programación Avanzada 🚀✨

IIC2233 - Programación Avanzada Evaluación Las evaluaciones serán efectuadas por medio de actividades prácticas en clases y tareas. Se calculará la no

IIC2233 @ UC 0 Dec 15, 2022
A collection of differentiable SVD methods and also the official implementation of the ICCV21 paper "Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?"

Differentiable SVD Introduction This repository contains: The official Pytorch implementation of ICCV21 paper Why Approximate Matrix Square Root Outpe

YueSong 32 Dec 25, 2022
A toolset for creating Qualtrics-based IAT experiments

Qualtrics IAT Tool A web app for generating the Implicit Association Test (IAT) running on Qualtrics Online Web App The app is hosted by Streamlit, a

0 Feb 12, 2022
An addon uses SMPL's poses and global translation to drive cartoon character in Blender.

Blender addon for driving character The addon drives the cartoon character by passing SMPL's poses and global translation into model's armature in Ble

犹在镜中 153 Dec 14, 2022
sktime companion package for deep learning based on TensorFlow

NOTE: sktime-dl is currently being updated to work correctly with sktime 0.6, and wwill be fully relaunched over the summer. The plan is Refactor and

sktime 573 Jan 05, 2023
A repository for the paper "Improved Adversarial Systems for 3D Object Generation and Reconstruction".

Improved Adversarial Systems for 3D Object Generation and Reconstruction: This is a repository for the paper "Improved Adversarial Systems for 3D Obje

Edward Smith 188 Dec 25, 2022
Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous Event-Based Data"

A Differentiable Recurrent Surface for Asynchronous Event-Based Data Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous

Marco Cannici 21 Oct 05, 2022
Implementation of the "PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences" paper.

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences Introduction Point cloud sequences are irregular and unordered in the spatial dimen

Hehe Fan 63 Dec 09, 2022