SimplEx - Explaining Latent Representations with a Corpus of Examples

Last update: Dec 15, 2022

Overview

SimplEx - Explaining Latent Representations with a Corpus of Examples

Code Author: Jonathan Crabbé ([email protected])

This repository contains the implementation of SimplEx, a method to explain the latent representations of black-box models with the help of a corpus of examples. For more details, please read our NeurIPS 2021 paper: 'Explaining Latent Representations with a Corpus of Examples'.

Installation

Clone the repository
Create a new virtual environment with Python 3.8

Run the following command from the repository folder:

pip install -r requirements.txt #install requirements

When the packages are installed, SimplEx can directly be used.

Toy example

Bellow, you can find a toy demonstration where we make a corpus decomposition of test examples representations. All the relevant code can be found in the file simplex.

from explainers.simplex import Simplex
from models.base import BlackBox

# Get the model and the examples
model = BlackBox() # Model should have the BlackBox interface
corpus_inputs = get_corpus() # A tensor of corpus inputs
test_inputs = get_test() # A set of inputs to explain

# Compute the corpus and test latent representations
corpus_latents = model.latent_representation(corpus_inputs) 
test_latents = model.latent_representation(test_inputs)

# Initialize SimplEX, fit it on test examples
simplex = Simplex(corpus_examples=corpus_inputs, 
                  corpus_latent_reps=corpus_latents)
simplex.fit(test_examples=test_inputs, 
            test_latent_reps=test_latents,
            reg_factor=0)

# Get the weights of each corpus decomposition
weights = simplex.weights

We get a tensor weights that can be interpreted as follows: weights[i,c] = weight of corpus example c in the decomposition of example i.

We can get the importance of each corpus feature for the decomposition of a given example i in the following way:

# Compute the Integrated Jacobian for a particular example
i = 42
input_baseline = get_baseline() # Baseline tensor of the same shape as corpus_inputs
simplex.jacobian_projections(test_id=i, model=model,
                             input_baseline=input_baseline)

result = simplex.decompose(i)

We get a list result where each element of the list corresponds to a corpus example. This list is sorted by decreasing order of importance in the corpus decomposition. Each element of the list is a tuple structured as follows:

w_c, x_c, proj_jacobian_c = result[c]

Where w_c corresponds to the weight weights[i,c], x_c corresponds to corpus_inputs[c] and proj_jacobian is a tensor such that proj_jacobian_c[k] is the Projected Jacobian of feature k from corpus example c.

Reproducing the paper results

Reproducing MNIST Approximation Quality Experiment

Run the following script for different values of CV (the results from the paper were obtained by taking all integer CV between 0 and 9)

python -m experiments.mnist -experiment "approximation_quality" -cv CV

Run the following script by adding all the values of CV from the previous step

python -m experiments.results.mnist.quality.plot_results -cv_list CV1 CV2 CV3 ...

The resulting plots and data are saved here.

Reproducing Prostate Cancer Approximation Quality Experiment

This experiment requires the access to the private datasets CUTRACT and SEER decribed in the paper.

Copy the files cutract_internal_all.csv and seer_external_imputed_new.csv are in the folder data/Prostate Cancer
Run the following script for different values of CV (the results from the paper were obtained by taking all integer CV between 0 and 9)

python -m experiments.prostate_cancer -experiment "approximation_quality" -cv CV

Run the following script by adding all the values of CV from the previous step

python -m experiments.results.prostate.quality.plot_results -cv_list CV1 CV2 CV3 ...

The resulting plots are saved here.

Reproducing Prostate Cancer Outlier Experiment

This experiment requires the access to the private datasets CUTRACT and SEER decribed in the paper.

Make sure that the files cutract_internal_all.csv and seer_external_imputed_new.csv are in the folder data/Prostate Cancer
Run the following script for different values of CV (the results from the paper were obtained by taking all integer CV between 0 and 9)

python -m experiments.prostate_cancer -experiment "outlier_detection" -cv CV

Run the following script by adding all the values of CV from the previous step

python -m experiments.results.prostate.outlier.plot_results -cv_list CV1 CV2 CV3 ...

The resulting plots are saved here.

Reproducing MNIST Jacobian Projection Significance Experiment

Run the following script

python -m experiments.mnist -experiment "jacobian_corruption"

2.The resulting plots and data are saved here.

Reproducing MNIST Outlier Detection Experiment

Run the following script for different values of CV (the results from the paper were obtained by taking all integer CV between 0 and 9)

python -m experiments.mnist -experiment "outlier_detection" -cv CV

Run the following script by adding all the values of CV from the previous step

python -m experiments.results.mnist.outlier.plot_results -cv_list CV1 CV2 CV3 ...

The resulting plots and data are saved here.

Reproducing MNIST Influence Function Experiment

Run the following script for different values of CV (the results from the paper were obtained by taking all integer CV between 0 and 4)

python -m experiments.mnist -experiment "influence" -cv CV

Run the following script by adding all the values of CV from the previous step

python -m experiments.results.mnist.influence.plot_results -cv_list CV1 CV2 CV3 ...

The resulting plots and data are saved here.

Note: some problems can appear with the package Pytorch Influence Functions. If this is the case, please change calc_influence_function.py in the following way:

343: influences.append(tmp_influence) ==> influences.append(tmp_influence.cpu())
438: influences_meta['test_sample_index_list'] = sample_list ==> #influences_meta['test_sample_index_list'] = sample_list

Reproducing AR Approximation Quality Experiment

Run the following script for different values of CV (the results from the paper were obtained by taking all integer CV between 0 and 4)

python -m experiments.time_series -experiment "approximation_quality" -cv CV

Run the following script by adding all the values of CV from the previous step

python -m experiments.results.ar.quality.plot_results -cv_list CV1 CV2 CV3 ...

The resulting plots and data are saved here.

Reproducing AR Outlier Detection Experiment

Run the following script for different values of CV (the results from the paper were obtained by taking all integer CV between 0 and 4)

python -m experiments.time_series -experiment "outlier_detection" -cv CV

Run the following script by adding all the values of CV from the previous step

python -m experiments.results.ar.outlier.plot_results -cv_list CV1 CV2 CV3 ...

The resulting plots and data are saved here.

Citing

If you use this code, please cite the associated paper:

Put citation here when ready

SimplEx - Explaining Latent Representations with a Corpus of Examples

Related tags

Overview

SimplEx - Explaining Latent Representations with a Corpus of Examples

Installation

Toy example

Reproducing the paper results

Reproducing MNIST Approximation Quality Experiment

Reproducing Prostate Cancer Approximation Quality Experiment

Reproducing Prostate Cancer Outlier Experiment

Reproducing MNIST Jacobian Projection Significance Experiment

Reproducing MNIST Outlier Detection Experiment

Reproducing MNIST Influence Function Experiment

Reproducing AR Approximation Quality Experiment

Reproducing AR Outlier Detection Experiment

Citing

Owner

Jonathan Crabbé

Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

An implementation of EWC with PyTorch

This git repo contains the implementation of my ML project on Heart Disease Prediction

[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park, Rares Ambrus, Vitor Guizilini, Jie Li, and Adrien Gaidon.

[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

Diagnostic tests for linguistic capacities in language models

ruptures: change point detection in Python

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

WormMovementSimulation - 3D Simulation of Worm Body Movement with Neurons attached to its body

Code repository for our paper regarding the L3D dataset.

Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Details about the wide minima density hypothesis and metrics to compute width of a minima

PyTorch code to run synthetic experiments.

Single-Shot Motion Completion with Transformer

Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

SimplEx - Explaining Latent Representations with a Corpus of Examples

Related tags

Overview

SimplEx - Explaining Latent Representations with a Corpus of Examples

Installation

Toy example

Reproducing the paper results

Reproducing MNIST Approximation Quality Experiment

Reproducing Prostate Cancer Approximation Quality Experiment

Reproducing Prostate Cancer Outlier Experiment

Reproducing MNIST Jacobian Projection Significance Experiment

Reproducing MNIST Outlier Detection Experiment

Reproducing MNIST Influence Function Experiment

Reproducing AR Approximation Quality Experiment

Reproducing AR Outlier Detection Experiment

Citing

Owner

Jonathan Crabbé

Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

An implementation of EWC with PyTorch

This git repo contains the implementation of my ML project on Heart Disease Prediction

[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

Diagnostic tests for linguistic capacities in language models

ruptures: change point detection in Python

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

WormMovementSimulation - 3D Simulation of Worm Body Movement with Neurons attached to its body

Code repository for our paper regarding the L3D dataset.

Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Details about the wide minima density hypothesis and metrics to compute width of a minima

PyTorch code to run synthetic experiments.

Single-Shot Motion Completion with Transformer

Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park, Rares Ambrus, Vitor Guizilini, Jie Li, and Adrien Gaidon.