Code for "Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance" at NeurIPS 2021

Overview

Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance

Justin Lim, Christina X Ji, Michael Oberst, Saul Blecker, Leora Horwitz, and David Sontag. 2021. Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance. In Thirty-fifth Conference on Neural Information Processing Systems.

Individuals often make different decisions when faced with the same context, due to personal preferences and background. For instance, judges may vary in their leniency towards certain drug-related offenses, and doctors may vary in their preference for how to start treatment for certain types of patients. With these examples in mind, we present an algorithm for identifying types of contexts (e.g., types of cases or patients) with high inter-decision-maker disagreement. We formalize this as a causal inference problem, seeking a region where the assignment of decision-maker has a large causal effect on the decision. We give an iterative algorithm to find a region maximizing this objective and give a generalization bound for its performance. In a semi-synthetic experiment, we show that our algorithm recovers the correct region of disagreement accurately compared to baselines. Finally, we apply our algorithm to real-world healthcare datasets, recovering variation that aligns with existing clinical knowledge.

To run our algorithm, see run_semisynth_exp_recover_beta.ipynb for how to call IterativeRegionEstimator.py. The baselines and our model are also implemented in baselines.py. Helper functions (e.g. for evaluation) are in helpers.py.

Please refer to the following steps to reproduce the experiments and figures in this paper:

  1. To set-up the required packages, run create_env.sh, passing in a conda environment name. Then run source activate with the environment name to enter it.

  2. To run the semi-synthetic experiment,

    1. Download the criminal justice dataset from https://github.com/stanford-policylab/recidivism-predictions
    2. Process the data using data_processing/semisynth_process_data.ipynb.
    3. To run the iterative algorithm and baselines, run python3 run_baselines_on_semisynth.py with the product of the following arguments:
      1. type of model: Iterative, Direct, TarNet, ULearner, CausalForest
      2. number of agents: 2, 5, 10, 20, 40, 87 in our experiments
      3. subset: drug_possession, misdemeanor_under35
    4. Figures 1, 3, and 4 compare metrics for the methods. They can be produced by running plot_semisynth.ipynb.
    5. Figure 2 examines tuning the region size. run_semisynth_exp_recoverbeta.ipynb is a stand-alone notebook for reproducing it.
    6. Figures 5 and 6 examine convergence of the iterative algorithm. They can be produced by running plot_convergence.ipynb.
    7. Figures 7 and 8 examine how robust the iterative algorithm and direct baselines are to violations of the assumption that there are two agent groups. First, run python3 run_robustness_semisynth_experiment.py with the product of the following arguments:
      1. type of model: Iterative, Direct
      2. number of groups: 2, 3, 5, 10
      3. subset: drug_possession, misdemeanor_under35 Note that the number of agents is fixed at 40. The figures can then be produced by running plot_robustness.ipynb.
    8. Note: Helper code that is called to generate semi-synthetic data is located in semisynth_subsets.py, semisynth_dataloader.py, and semisynth_dataloader_robust.py.
  3. The real-world diabetes experiment uses proprietary data extracted using generate_t2dm_cohort.sql and first_line.sql.

    1. Select an outcome model from logistic regressions, decision trees, and random forests based on AUC, calibration, and partial dependence plots. Figure 9 and the statistics in Table 2 that guided our selection of a random forest outcome model are produced in select_outcome_model_for_diabetes_experiment.ipynb.
    2. The experiment is run with python3 run_baseline_models.py diabetes Iterative DecisionTree RandomForest. Figure 10b, the information needed to create Figures 10a, the statistics in Tables 1 and 3, and the fold consistency evaluation will be outputted.
    3. Note: Data loading helper functions, including how data is split, are located in real_data_loader.py. Most of the functions called to generate the output are located in realdata_analysis.py.
  4. The real-world Parkinson's experiment was run using open-access data.

    1. Download the data from https://www.ppmi-info.org/.
    2. Run python3 ppmi_feature_extraction.py passing in the directory containing the downloaded raw data and directory where processed data will be outputted.
    3. Manually process the treatment data to correct for typos in the drug name and treatment date
    4. Run process_parkinsons_data.ipynb to gather the data for the experiment.
    5. The experiment is run with python3 run_baseline_models.py ppmi Iterative DecisionTree. The information for creating Figure 11 and Table 4 are outputted.
Owner
Sontag Lab
Machine learning algorithms and applications to health care.
Sontag Lab
Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

AASIST This repository provides the overall framework for training and evaluating audio anti-spoofing systems proposed in 'AASIST: Audio Anti-Spoofing

Clova AI Research 56 Jan 02, 2023
Implementation of CVPR 2020 Dual Super-Resolution Learning for Semantic Segmentation

Dual super-resolution learning for semantic segmentation 2021-01-02 Subpixel Update Happy new year! The 2020-12-29 update of SISR with subpixel conv p

Sam 79 Nov 24, 2022
Implementation of SwinTransformerV2 in TensorFlow.

SwinTransformerV2-TensorFlow A TensorFlow implementation of SwinTransformerV2 by Microsoft Research Asia, based on their official implementation of Sw

Phan Nguyen 2 May 30, 2022
Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

DroneCrowd Paper Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. Introduction This paper proposes a space-time multi-scale atte

VisDrone 98 Nov 16, 2022
offical implement of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021

LifelongReID Offical implementation of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021 by Nan Pu, Wei Chen, Yu L

PeterPu 76 Dec 08, 2022
Power Core Simulator!

Power Core Simulator Power Core Simulator is a simulator based off the Roblox game "Pinewood Builders Computer Core". In this simulator, you can choos

BananaJeans 1 Nov 13, 2021
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

512x512 flowers after 12 hours of training, 1 gpu 256x256 flowers after 12 hours of training, 1 gpu Pizza 'Lightweight' GAN Implementation of 'lightwe

Phil Wang 1.5k Jan 02, 2023
A Runtime method overload decorator which should behave like a compiled language

strongtyping-pyoverload A Runtime method overload decorator which should behave like a compiled language there is a override decorator from typing whi

20 Oct 31, 2022
NeurIPS 2021, self-supervised 6D pose on category level

SE(3)-eSCOPE video | paper | website Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation Xiaolong Li, Yijia Weng,

Xiaolong 63 Nov 22, 2022
BirdCLEF 2021 - Birdcall Identification 4th place solution

BirdCLEF 2021 - Birdcall Identification 4th place solution My solution detail kaggle discussion Inference Notebook (best submission) Environment Use K

tattaka 42 Jan 02, 2023
PyTorch implementation of MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning This is a PyTorch implementation of the MoCo paper: @Article{he2019moco, aut

Meta Research 3.7k Jan 02, 2023
EssentialMC2 Video Understanding

EssentialMC2 Introduction EssentialMC2 is a complete system to solve video understanding tasks including MHRL(representation learning), MECR2( relatio

Alibaba 106 Dec 11, 2022
HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision

HugsVision is an open-source and easy to use all-in-one huggingface wrapper for computer vision. The goal is to create a fast, flexible and user-frien

Labrak Yanis 166 Nov 27, 2022
Implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

PRP Introduction This is the implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

yuanyao366 39 Dec 29, 2022
Python SDK for building, training, and deploying ML models

Overview of Kubeflow Fairing Kubeflow Fairing is a Python package that streamlines the process of building, training, and deploying machine learning (

Kubeflow 325 Dec 13, 2022
Its a Plant Leaf Disease Detection System based on Machine Learning.

My_Project_Code Its a Plant Leaf Disease Detection System based on Machine Learning. I have used Tomato Leaves Dataset from kaggle. This system detect

Sanskriti Sidola 3 Jun 15, 2022
Subgraph Based Learning of Contextual Embedding

SLiCE Self-Supervised Learning of Contextual Embeddings for Link Prediction in Heterogeneous Networks Dataset details: We use four public benchmark da

Pacific Northwest National Laboratory 27 Dec 01, 2022
Training Cifar-10 Classifier Using VGG16

opevcvdl-hw3 This project uses pytorch and Qt to achieve the requirements. Version Python 3.6 opencv-contrib-python 3.4.2.17 Matplotlib 3.1.1 pyqt5 5.

Kenny Cheng 3 Aug 17, 2022
My course projects for the 2021 Spring Machine Learning course at the National Taiwan University (NTU)

ML2021Spring There are my projects for the 2021 Spring Machine Learning course at the National Taiwan University (NTU) Course Web : https://speech.ee.

Ding-Li Chen 15 Aug 29, 2022
Official PyTorch implemention of our paper "Learning to Rectify for Robust Learning with Noisy Labels".

WarPI The official PyTorch implemention of our paper "Learning to Rectify for Robust Learning with Noisy Labels". Run python main.py --corruption_type

Haoliang Sun 3 Sep 03, 2022