XAI - An eXplainability toolbox for machine learning

Last update: Dec 27, 2022

Overview

XAI - An eXplainability toolbox for machine learning

XAI is a Machine Learning library that is designed with AI explainability in its core. XAI contains various tools that enable for analysis and evaluation of data and models. The XAI library is maintained by The Institute for Ethical AI & ML, and it was developed based on the 8 principles for Responsible Machine Learning.

You can find the documentation at https://ethicalml.github.io/xai/index.html. You can also check out our talk at Tensorflow London where the idea was first conceived - the talk also contains an insight on the definitions and principles in this library.

YouTube video showing how to use XAI to mitigate undesired biases

This video of the talk presented at the PyData London 2019 Conference which provides an overview on the motivations for machine learning explainability as well as techniques to introduce explainability and mitigate undesired biases using the XAI Library.
Do you want to learn about more awesome machine learning explainability tools? Check out our community-built "Awesome Machine Learning Production & Operations" list which contains an extensive list of tools for explainability, privacy, orchestration and beyond.

0.1.0

If you want to see a fully functional demo in action clone this repo and run the Example Jupyter Notebook in the Examples folder.

What do we mean by eXplainable AI?

We see the challenge of explainability as more than just an algorithmic challenge, which requires a combination of data science best practices with domain-specific knowledge. The XAI library is designed to empower machine learning engineers and relevant domain experts to analyse the end-to-end solution and identify discrepancies that may result in sub-optimal performance relative to the objectives required. More broadly, the XAI library is designed using the 3-steps of explainable machine learning, which involve 1) data analysis, 2) model evaluation, and 3) production monitoring.

We provide a visual overview of these three steps mentioned above in this diagram:

XAI Quickstart

Installation

The XAI package is on PyPI. To install you can run:

pip install xai

Alternatively you can install from source by cloning the repo and running:

python setup.py install

Usage

You can find example usage in the examples folder.

1) Data Analysis

With XAI you can identify imbalances in the data. For this, we will load the census dataset from the XAI library.

import xai.data
df = xai.data.load_census()
df.head()

View class imbalances for all categories of one column

ims = xai.imbalance_plot(df, "gender")

View imbalances for all categories across multiple columns

im = xai.imbalance_plot(df, "gender", "loan")

Balance classes using upsampling and/or downsampling

bal_df = xai.balance(df, "gender", "loan", upsample=0.8)

Perform custom operations on groups

groups = xai.group_by_columns(df, ["gender", "loan"])
for group, group_df in groups:    
    print(group) 
    print(group_df["loan"].head(), "\n")

Visualise correlations as a matrix

_ = xai.correlations(df, include_categorical=True, plot_type="matrix")

Visualise correlations as a hierarchical dendogram

_ = xai.correlations(df, include_categorical=True)

Create a balanced validation and training split dataset

# Balanced train-test split with minimum 300 examples of 
#     the cross of the target y and the column gender
x_train, y_train, x_test, y_test, train_idx, test_idx = \
    xai.balanced_train_test_split(
            x, y, "gender", 
            min_per_group=300,
            max_per_group=300,
            categorical_cols=categorical_cols)

x_train_display = bal_df[train_idx]
x_test_display = bal_df[test_idx]

print("Total number of examples: ", x_test.shape[0])

df_test = x_test_display.copy()
df_test["loan"] = y_test

_= xai.imbalance_plot(df_test, "gender", "loan", categorical_cols=categorical_cols)

2) Model Evaluation

We are able to also analyse the interaction between inference results and input features. For this, we will train a single layer deep learning model.

= 0.5).astype(int).T[0]) ">

model = build_model(proc_df.drop("loan", axis=1))

model.fit(f_in(x_train), y_train, epochs=50, batch_size=512)

probabilities = model.predict(f_in(x_test))
predictions = list((probabilities >= 0.5).astype(int).T[0])

Visualise permutation feature importance

def get_avg(x, y):
    return model.evaluate(f_in(x), y, verbose=0)[1]

imp = xai.feature_importance(x_test, y_test, get_avg)

imp.head()

Identify metric imbalances against all test data

_= xai.metrics_plot(
        y_test, 
        probabilities)

Identify metric imbalances across a specific column

_ = xai.metrics_plot(
    y_test, 
    probabilities, 
    df=x_test_display, 
    cross_cols=["gender"],
    categorical_cols=categorical_cols)

Identify metric imbalances across multiple columns

_ = xai.metrics_plot(
    y_test, 
    probabilities, 
    df=x_test_display, 
    cross_cols=["gender", "ethnicity"],
    categorical_cols=categorical_cols)

Draw confusion matrix

xai.confusion_matrix_plot(y_test, pred)

Visualise the ROC curve against all test data

_ = xai.roc_plot(y_test, probabilities)

Visualise the ROC curves grouped by a protected column

protected = ["gender", "ethnicity", "age"]
_ = [xai.roc_plot(
    y_test, 
    probabilities, 
    df=x_test_display, 
    cross_cols=[p],
    categorical_cols=categorical_cols) for p in protected]

Visualise accuracy grouped by probability buckets

d = xai.smile_imbalance(
    y_test, 
    probabilities)

Visualise statistical metrics grouped by probability buckets

d = xai.smile_imbalance(
    y_test, 
    probabilities,
    display_breakdown=True)

Visualise benefits of adding manual review on probability thresholds

d = xai.smile_imbalance(
    y_test, 
    probabilities,
    bins=9,
    threshold=0.75,
    manual_review=0.375,
    display_breakdown=False)

Comments

matplotlib error while installing package

Collecting matplotlib==3.0.2

Using cached matplotlib-3.0.2.tar.gz (36.5 MB) ERROR: Command errored out with exit status 1: command: /opt/anaconda3/envs/ethicalml/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/wv/m62_p54d5bx1dnq_m07ck3l40000gn/T/pip-install-303disb7/matplotlib/setup.py'"'"'; file='"'"'/private/var/folders/wv/m62_p54d5bx1dnq_m07ck3l40000gn/T/pip-install-303disb7/matplotlib/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/wv/m62_p54d5bx1dnq_m07ck3l40000gn/T/pip-install-303disb7/matplotlib/pip-egg-info

opened by ArpitSisodia 3
Requirements
your requirements are very restrictive. Can you please change it to >= instead of ==. for example:

numpy>=1.3 pandas>=0.23.0 matplotlib>2.02,<=3.0.3 scikit-learn>=0.19.0
opened by idanmoradarthas 3
converters the probs into np array if its already not
smile_imbalance() funciton argument "probs" does not specify that it is required to be numpy array, but it does so i have added that data type in the argument letting the user know if he/she is to refer to the docs and i have also added a line np.array() which is an idempotent operation(if the array passed is already numpy array then it does nothing but if its not it changes the list into numpy array)

Suggestion

If possible can you guys consider adding "save_plot_path" method to each function, so that when this package is used in production (which i am and people considering Continuous model delivery would use) all these plots could be saved to a particular directory for data scientists to look at later since in production, code would be used in scripts running on EC2 or other cloud servers and not on jupyter notebooks

My use case is I am retraining the model every week and XAI allows me to generate a evaluation report allowing me to remotely decide weather to push this weeks mode into production

I considered adding it myself but i was not sure if this is the direction you guys wanted to take

Thank you
opened by sai-krishna-msk 2
Unable to install package
Hello!

I've been trying to install this package and am unable to do so. I've tried both methods on my Ubuntu machine.

pip install xai

python setup.py install

What can I do to install this? Also, is this project active anymore at all?
opened by varunbanda 2
Can we explain BERT models using this package?

I'm working with text data and looking for ways to explain BERT models. Is there any workaround using XAI or any other package/resources if anyone can recommend?

opened by techwithshadab 1
Add a conda install option for `xai`
A conda installation option could be very helpful. I have already started working on this, to add xai to conda-forge.

Conda-forge PR:

https://github.com/conda-forge/staged-recipes/pull/17601

Once the conda-forge PR is merged, you will be able to install the library with conda as follows:

conda install -c conda-forge xai

:bulb: I will push a PR to update the docs once the package is available on conda-forge.
opened by sugatoray 0
Wrong series returned from _curve
There is some bug in https://github.com/EthicalML/xai/blob/master/xai/init.py#L962

it was written as

r1s = r2s = []

but should be instead

r1s, r2s = [], []

The impact is that if the user would like to us r1s and r2s returned to construct the the curve (e.g. for storing the data for later analysis), they would find that r1s and r2s are referring to the same instance which stores all the curve data that should have been separately stored in r1s and r2s
opened by chen0040 1

Releases(v0.1.0)

v0.1.0(Oct 30, 2021)

Release Version v0.1.0
Source code(tar.gz)
Source code(zip)

Owner

The Institute for Ethical Machine Learning

The Institute for Ethical Machine Learning is a think-tank that brings together with technology leaders, policymakers & academics to develop standards for ML.

GitHub Repository https://ethical.institute/principles.html#commitment-3

scikit-learn models hyperparameters tuning and feature selection, using evolutionary algorithms.

Sklearn-genetic-opt scikit-learn models hyperparameters tuning and feature selection, using evolutionary algorithms. This is meant to be an alternativ

180 Dec 20, 2022

Turns your machine learning code into microservices with web API, interactive GUI, and more.

2.8k Jan 02, 2023

A collection of neat and practical data science and machine learning projects

Data Science A collection of neat and practical data science and machine learning projects Explore the docs » Report Bug · Request Feature Table of Co

2 Dec 10, 2021

XGBoost + Optuna

AutoXGB XGBoost + Optuna: no brainer auto train xgboost directly from CSV files auto tune xgboost using optuna auto serve best xgboot model using fast

517 Dec 31, 2022

customer churn prediction prevention in telecom industry using machine learning and survival analysis

Telco Customer Churn Prediction - Plotly Dash Application Description This dash application allows you to predict telco customer churn using machine l

3 Nov 20, 2021

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Petastorm Contents Petastorm Installation Generating a dataset Plain Python API Tensorflow API Pytorch API Spark Dataset Converter API Analyzing petas

1.6k Dec 31, 2022

Practical Time-Series Analysis, published by Packt

Practical Time-Series Analysis This is the code repository for Practical Time-Series Analysis, published by Packt. It contains all the supporting proj

325 Dec 23, 2022

Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

Iris Species Predictor Iris species predictor app is used to classify iris species using their sepal length, sepal width, petal length and petal width

5 Apr 05, 2022

Covid-polygraph - a set of Machine Learning-driven fact-checking tools

Covid-polygraph, a set of Machine Learning-driven fact-checking tools that aim to address the issue of misleading information related to COVID-19.

1 Apr 22, 2022

Machine Learning for Time-Series with Python.Published by Packt

Machine-Learning-for-Time-Series-with-Python Become proficient in deriving insights from time-series data and analyzing a model’s performance Links Am

124 Dec 28, 2022

Cohort Intelligence used to solve various mathematical functions

Cohort-Intelligence-for-Mathematical-Functions About Cohort Intelligence : Cohort Intelligence ( CI ) is an optimization technique. It attempts to mod

2 Oct 25, 2021

This is the code repository for Interpretable Machine Learning with Python, published by Packt.

Interpretable Machine Learning with Python, published by Packt

299 Jan 02, 2023

Python package for machine learning for healthcare using a OMOP common data model

This library was developed in order to facilitate rapid prototyping in Python of predictive machine-learning models using longitudinal medical data from an OMOP CDM-standard database.

75 Jan 03, 2023

Turning images into '9-pan' palettes using KMeans clustering from sklearn.

img2palette Turning images into '9-pan' palettes using KMeans clustering from sklearn. Requirements We require: Pillow, for opening and processing ima

2 Jan 01, 2022

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

TensorFlowOnSpark TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from the T

3.8k Jan 04, 2023

Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

7k Jan 06, 2023

Hypernets: A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

216 Dec 23, 2022

STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

2.5k Jan 06, 2023

It is a forest of random projection trees

rpforest rpforest is a Python library for approximate nearest neighbours search: finding points in a high-dimensional space that are close to a given

211 Dec 29, 2022

A benchmark of data-centric tasks from across the machine learning lifecycle.

61 Dec 28, 2022

XAI - An eXplainability toolbox for machine learning

Related tags

Overview

XAI - An eXplainability toolbox for machine learning

YouTube video showing how to use XAI to mitigate undesired biases

0.1.0

What do we mean by eXplainable AI?

XAI Quickstart

Installation

Usage

1) Data Analysis

View class imbalances for all categories of one column

View imbalances for all categories across multiple columns

Balance classes using upsampling and/or downsampling

Perform custom operations on groups

Visualise correlations as a matrix

Visualise correlations as a hierarchical dendogram

Create a balanced validation and training split dataset

2) Model Evaluation

Visualise permutation feature importance

Identify metric imbalances against all test data

Identify metric imbalances across a specific column

Identify metric imbalances across multiple columns

Draw confusion matrix

Visualise the ROC curve against all test data

Visualise the ROC curves grouped by a protected column

Visualise accuracy grouped by probability buckets

Visualise statistical metrics grouped by probability buckets

Visualise benefits of adding manual review on probability thresholds

Comments

matplotlib error while installing package

Requirements

converters the probs into np array if its already not

Unable to install package

Can we explain BERT models using this package?

Add a conda install option for `xai`

Wrong series returned from _curve

Releases(v0.1.0)

v0.1.0(Oct 30, 2021)

Release Version v0.1.0

Owner

The Institute for Ethical Machine Learning

scikit-learn models hyperparameters tuning and feature selection, using evolutionary algorithms.

Turns your machine learning code into microservices with web API, interactive GUI, and more.

A collection of neat and practical data science and machine learning projects

XGBoost + Optuna

customer churn prediction prevention in telecom industry using machine learning and survival analysis

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Practical Time-Series Analysis, published by Packt

Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

Covid-polygraph - a set of Machine Learning-driven fact-checking tools

Machine Learning for Time-Series with Python.Published by Packt

Cohort Intelligence used to solve various mathematical functions

This is the code repository for Interpretable Machine Learning with Python, published by Packt.

Python package for machine learning for healthcare using a OMOP common data model

Turning images into '9-pan' palettes using KMeans clustering from sklearn.

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

Automatic extraction of relevant features from time series:

Hypernets: A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

It is a forest of random projection trees

A benchmark of data-centric tasks from across the machine learning lifecycle.