A library that implements fairness-aware machine learning algorithms

Last update: Dec 30, 2022

Overview

Themis ML

themis-ml is a Python library built on top of pandas and sklearnthat implements fairness-aware machine learning algorithms.

Fairness-aware Machine Learning

themis-ml defines discrimination as the preference (bias) for or against a set of social groups that result in the unfair treatment of its members with respect to some outcome.

It defines fairness as the inverse of discrimination, and in the context of a machine learning algorithm, this is measured by the degree to which the algorithm's predictions favor one social group over another in relation to an outcome that holds socioeconomic, political, or legal importance, e.g. the denial/approval of a loan application.

A "fair" algorithm depends on how we define fairness. For example if we define fairness as statistical parity, a fair algorithm is one in which the proportion of approved loans among minorities is equal to the proportion of approved loans among white people.

Features

Here are a few of the discrimination discovery and fairness-aware techniques that this library implements.

Measuring Discrimination

Mean difference
Normalized mean difference
Consistency
Situation Test Score

Mitigating Discrimination

Preprocessing

Relabelling (Massaging)
Reweighting
Sampling

Model Estimation

Additive Counterfactually Fair Estimator
Prejudice Remover Regularized Estimator

Postprocessing

Reject Option Classification
Discrimination-aware Ensemble Classification

Datasets

themis-ml also provides utility functions for loading freely available datasets from a variety of sources.

German Credit (source)
Census Income (source)
Taiwan Credit Default (source)
Australian Credit Approval (source)
Adult Census (source)
Communities and Crime (source)
Disabled Residents Expenditure (source)

Installation

The source code is currently hosted on GitHub: https://github.com/cosmicBboy/themis-ml. You can install the latest released version with conda or pip.

# conda
conda install -c cosmicbboy themis-ml

If you install with pip, you'll need to install scikit-learn, numpy, and pandas with either pip or conda. Version requirements:

numpy (>= 1.9.0)
scikit-learn (>= 0.19.1)
pandas (>= 0.22.0)

# pip
pip install themis-ml

Documentation

Official documentation for this package can be found here

References

You can find a complete set of references for the discrimination discovery and fairness-aware methods implemented in themis-ml in this paper.

Comments

Implement "Additive Counterfactually Fair" estimator
The main idea is to:

train linear models using some linear estimator M to predict each feature x_i using the protected class attribute s as input.

then compute the residuals epsilon_ij between the predicted feature values and true feature values for each observation i for each feature j.

The final model is then trained using epsilon_ij as input features to predict the target y.
opened by cosmicBboy 1
Create utility function to load German Credit dataset

Create a function german_dataset that outputs a pandas dataframe with the german credit data found here:

https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)

opened by cosmicBboy 1
add census income data.

fixes #31. This diff adds a function to read in census income data from 1994-1995, taken from https://archive.ics.uci.edu/ml/datasets/Census-Income+(KDD)

opened by cosmicBboy 0
add census income data.

fixes #31. This diff adds a function to read in census income data from 1994-1995, taken from https://archive.ics.uci.edu/ml/datasets/Census-Income+(KDD)

opened by cosmicBboy 0
add census income data.

fixes #31. This diff adds a function to read in census income data from 1994-1995, taken from https://archive.ics.uci.edu/ml/datasets/Census-Income+(KDD)

opened by cosmicBboy 0
bound mean difference CI metrics to -1, 1 range

this revision adds constraints to the mean difference scores such that both mean difference and normalized mean difference confidence internval bounds are within -1 and 1

opened by cosmicBboy 0
bound mean difference CI metrics to -1, 1 range

this revision adds constraints to the mean difference scores such that both mean difference and normalized mean difference confidence internval bounds are within -1 and 1

opened by cosmicBboy 0
conda build recipe, pip install deps, py3.6 for ci
this revision does the following:

add conda build recipe in the conda.recipe directory

adds pypi install dependencies

makefile convenience targets for building conda on 2.7 and 3.6

adds support for python 3.6

add travis ci for 3.6
opened by cosmicBboy 0
Implement "Reject Option Classification" post-processing technique
Single Classifier Setting

training an initial classifier on dataset D

generating predicted probabilities on the test set

computing the proximity of each prediction to the decision boundary learned by the classifier

within the critical region threshold theta around the decision boundary, where 0.5 < theta < 1, X_s1 (disadvantaged observations) are assigned as y+ and X_s0 (advantaged observations are assigned as y –.

Multi-classifier Setting

ROC in the multiple classifier setting is similar to the single classifier setting, except that predicted probabilities are defined as the weighted average of probabilities generated by each classifier C_k (k is the number of different classifiers trained), where the weights can be defined as:

the accuracy of the classifier on the data.

uniform (take the mean of the predictions)
opened by cosmicBboy 0
Implement "Massaging"/"Relabelling" data transformation technique
This technique essentially relabels the target variables using a function that can compute a decision boundary in input data space.

the top n -ve labelled observations in the disadvantaged group s1 that are closest to the decision boundary are "promoted" to the +ve label.

the top n +ve labelled observations in the advantaged group s0 closest to the decision boundary are "demoted' to the -ve label.

n is the number of promotions/demotions needed to make p(+|s0) = p(+|s1)
opened by cosmicBboy 0
Create stratified metrics for mean difference and normalized mean difference

The purpose of this issue is to add support for stratified mean difference and normalized mean difference so that we can control for other explanatory (or confounding) factors that may be driving the mean difference in outcome y between the advantaged and disadvantaged groups s_+ and s_-

opened by cosmicBboy 0
Implement "Discrimination-aware Ensemble Classification" postprocessing

DAEC is like #10, with a similar relabelling rule as ROC but re-assigns any prediction where classifiers disagree on the predicted label.

For example, if the an observation was positively labelled and the ensemble classifiers disagree on the predicted label, then the prediction would be negative.

opened by cosmicBboy 0
Implement "Prejudice Remover Regularized" Estimator
PRR as an optimization technique that extends the standard L1/L2-norm regularization method by adding a prejudice index term to the objective function. This term is equivalent to normalized mutual information, which measures the degree to which predictions ŷ and s are dependent on each other.

With values ranging from 0 to 1, 0 means that ŷ and s are independent and a value of 1 means that they are dependent. The goal of the objective function is to find model parameters that minimize the difference between ŷ and y in addition to the degree to which ŷ depends on s. See reference below for exact implementation details.

Reference: Kamishima, T., Akaho, S., Asoh, H., & Sakuma, J. (2012). Fairness-aware classifier with prejudice remover regularizer. Machine Learning and Knowledge Discovery in Databases, 35-50. http://www.kamishima.net/archive/2011-ws-icdm_padm.pdf
opened by cosmicBboy 0

Releases(0.0.4)

0.0.4(Feb 20, 2018)

Source code(tar.gz)
Source code(zip)
0.0.1(Sep 22, 2017)
This is the initial release of themis-ml. It exposes the following functionality with respect to fairness-aware machine learning methods:

Measuring Discrimination

Group-level measures

Mean difference

Normalized mean difference

Mitigating Discrimination

Preprocessing

Relabelling (Massaging)

Reweighting

Model Estimation

Additive Counterfactually Fair Estimator

Postprocessing

Reject Option Classification

Datasets

German Credit Dataset

Source code(tar.gz)
Source code(zip)

Owner

Niels Bantilan

Data Scientist, Machine Learning Engineer

GitHub Repository

Convolutional neural network visualization techniques implemented in PyTorch.

This repository contains a number of convolutional neural network visualization techniques implemented in PyTorch.

1 Nov 06, 2021

Lime: Explaining the predictions of any machine learning classifier

lime This project is about explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predict

10.3k Jan 01, 2023

Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)

Hierarchical neural-net interpretations (ACD) 🧠 Produces hierarchical interpretations for a single prediction made by a pytorch neural network. Offic

111 Jan 03, 2023

Visual analysis and diagnostic tools to facilitate machine learning model selection.

Yellowbrick Visual analysis and diagnostic tools to facilitate machine learning model selection. What is Yellowbrick? Yellowbrick is a suite of visual

3.9k Dec 30, 2022

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Documentation | External Resources | Research Paper Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble. The

187 Dec 27, 2022

Lucid library adapted for PyTorch

Lucent PyTorch + Lucid = Lucent The wonderful Lucid library adapted for the wonderful PyTorch! Lucent is not affiliated with Lucid or OpenAI's Clarity

520 Dec 26, 2022

🎆 A visualization of the CapsNet layers to better understand how it works

CapsNet-Visualization For more information on capsule networks check out my Medium articles here and here. Setup Use pip to install the required pytho

387 Dec 06, 2022

Net2Vis automatically generates abstract visualizations for convolutional neural networks from Keras code.

Automatic neural network visualizations generated in your browser!

99 Nov 30, 2022

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Comprehensive collection of Pixel Attribution methods for Computer Vision.

6.5k Jan 01, 2023

An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

2.3k Dec 31, 2022

A library that implements fairness-aware machine learning algorithms

Themis ML themis-ml is a Python library built on top of pandas and sklearnthat implements fairness-aware machine learning algorithms. Fairness-aware M

105 Dec 30, 2022

A python library for decision tree visualization and model interpretation.

dtreeviz : Decision Tree Visualization Description A python library for decision tree visualization and model interpretation. Currently supports sciki

2.4k Jan 02, 2023

A game theoretic approach to explain the output of any machine learning model.

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allo

18.3k Jan 08, 2023

Code for visualizing the loss landscape of neural nets

Visualizing the Loss Landscape of Neural Nets This repository contains the PyTorch code for the paper Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer

2.2k Dec 30, 2022

pytorch implementation of "Distilling a Neural Network Into a Soft Decision Tree"

Soft-Decision-Tree Soft-Decision-Tree is the pytorch implementation of Distilling a Neural Network Into a Soft Decision Tree, paper recently published

262 Dec 04, 2022

Making decision trees competitive with neural networks on CIFAR10, CIFAR100, TinyImagenet200, Imagenet

Neural-Backed Decision Trees · Site · Paper · Blog · Video Alvin Wan, *Lisa Dunlap, *Daniel Ho, Jihan Yin, Scott Lee, Henry Jin, Suzanne Petryk, Sarah

556 Dec 20, 2022

TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, Korean, Chinese, German and Easy to adapt for other languages)

🤪 TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we c

3k Jan 04, 2023

A library that implements fairness-aware machine learning algorithms

Related tags

Overview

Themis ML

Fairness-aware Machine Learning

Features

Measuring Discrimination

Mitigating Discrimination

Preprocessing

Model Estimation

Postprocessing

Datasets

Installation

Documentation

References

Comments

Releases(0.0.4)

0.0.4(Feb 20, 2018)

0.0.1(Sep 22, 2017)

Measuring Discrimination

Group-level measures

Mitigating Discrimination

Preprocessing

Model Estimation

Postprocessing

Datasets

Owner

Niels Bantilan

Convolutional neural network visualization techniques implemented in PyTorch.

Lime: Explaining the predictions of any machine learning classifier

Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)

Visual analysis and diagnostic tools to facilitate machine learning model selection.

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Lucid library adapted for PyTorch

🎆 A visualization of the CapsNet layers to better understand how it works

Net2Vis automatically generates abstract visualizations for convolutional neural networks from Keras code.

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

An intuitive library to add plotting functionality to scikit-learn objects.

A library that implements fairness-aware machine learning algorithms

A python library for decision tree visualization and model interpretation.

A game theoretic approach to explain the output of any machine learning model.

Code for visualizing the loss landscape of neural nets

pytorch implementation of "Distilling a Neural Network Into a Soft Decision Tree"

Making decision trees competitive with neural networks on CIFAR10, CIFAR100, TinyImagenet200, Imagenet

TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, Korean, Chinese, German and Easy to adapt for other languages)

Algorithms for monitoring and explaining machine learning models

Pytorch implementation of convolutional neural network visualization techniques

👋🦊 Xplique is a Python toolkit dedicated to explainability, currently based on Tensorflow.