This is the code repository for Interpretable Machine Learning with Python, published by Packt.

Overview

Interpretable Machine Learning with Python

Interpretable Machine Learning with Pythone

This is the code repository for Interpretable Machine Learning with Python, published by Packt.

Learn to build interpretable high-performance models with hands-on real-world examples

What is this book about?

Do you want to understand your models and mitigate the risks associated with poor predictions using practical machine learning (ML) interpretation? Interpretable Machine Learning with Python can help you overcome these challenges, using interpretation methods to build fairer and safer ML models.

This book covers the following exciting features:

  • Recognize the importance of interpretability in business
  • Study models that are intrinsically interpretable such as linear models, decision trees, and Naïve Bayes
  • Become well-versed in interpreting models with model-agnostic methods
  • Visualize how an image classifier works and what it learns
  • Understand how to mitigate the influence of bias in datasets

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

base_classifier = KerasClassifier(model=base_model,\
                                  clip_values=(min_, max_))
y_test_mdsample_prob = np.max(y_test_prob[sampl_md_idxs],\
                                                       axis=1)
y_test_smsample_prob = np.max(y_test_prob[sampl_sm_idxs],\
                                                       axis=1)

Following is what you need for this book: This book is for data scientists, machine learning developers, and data stewards who have an increasingly critical responsibility to explain how the AI systems they develop work, their impact on decision making, and how they identify and manage bias. Working knowledge of machine learning and the Python programming language is expected.

With the following software and hardware list you can run all code files present in the book (Chapter 1-14).

Software and Hardware List

You can install the software required in any operating system by first installing Jupyter Notebook or Jupyter Lab with the most recent version of Python, or install Anaconda which can install everything at once. While hardware requirements for Jupyter are relatively modest, we recommend a machine with at least 4 cores of 2Ghz and 8Gb of RAM.

Alternatively, to installing the software locally, you can run the code in the cloud using Google Colab or another cloud notebook service.

Either way, the following packages are required to run the code in all the chapters (Google Colab has all the packages denoted with a ^):

Chapter Software required OS required
1 - 13 ^ Python 3.6+ Windows, Mac OS X, and Linux (Any)
1 - 13 ^ matplotlib 3.2.2+ Windows, Mac OS X, and Linux (Any)
1 - 13 ^ scikit-learn 0.22.2+ Windows, Mac OS X, and Linux (Any)
1 - 12 ^ pandas 1.1.5+ Windows, Mac OS X, and Linux (Any)
2 - 13 machine-learning-datasets 0.01.16+ Windows, Mac OS X, and Linux (Any)
2 - 13 ^ numpy 1.19.5+ Windows, Mac OS X, and Linux (Any)
3 - 13 ^ seaborn 0.11.1+ Windows, Mac OS X, and Linux (Any)
3 - 13 ^ tensorflow 2.4.1+ Windows, Mac OS X, and Linux (Any)
5 - 12 shap 0.38.1+ Windows, Mac OS X, and Linux (Any)
1, 5, 10, 12 ^ scipy 1.4.1+ Windows, Mac OS X, and Linux (Any)
5, 10-12 ^ xgboost 0.90+ Windows, Mac OS X, and Linux (Any)
6, 11, 12 ^ lightgbm 2.2.3+ Windows, Mac OS X, and Linux (Any)
7 - 9 alibi 0.5.5+ Windows, Mac OS X, and Linux (Any)
10 - 13 ^ tqdm 4.41.1+ Windows, Mac OS X, and Linux (Any)
2, 9 ^ statsmodels 0.10.2+ Windows, Mac OS X, and Linux (Any)
3, 5 rulefit 0.3.1+ Windows, Mac OS X, and Linux (Any)
6, 8 lime 0.2.0.1+ Windows, Mac OS X, and Linux (Any)
7, 12 catboost 0.24.4+ Windows, Mac OS X, and Linux (Any)
8, 9 ^ Keras 2.4.3+ Windows, Mac OS X, and Linux (Any)
11, 12 ^ pydot 1.3.0+ Windows, Mac OS X, and Linux (Any)
11, 12 xai 0.0.4+ Windows, Mac OS X, and Linux (Any)
1 ^ beautifulsoup4 4.6.3+ Windows, Mac OS X, and Linux (Any)
1 ^ requests 2.23.0+ Windows, Mac OS X, and Linux (Any)
3 cvae 0.0.3+ Windows, Mac OS X, and Linux (Any)
3 interpret 0.2.2+ Windows, Mac OS X, and Linux (Any)
3 ^ six 1.15.0+ Windows, Mac OS X, and Linux (Any)
3 skope-rules 1.0.1+ Windows, Mac OS X, and Linux (Any)
4 PDPbox 0.2.0+ Windows, Mac OS X, and Linux (Any)
4 pycebox 0.0.1+ Windows, Mac OS X, and Linux (Any)
5 alepython 0.1+ Windows, Mac OS X, and Linux (Any)
5 tensorflow-docs 0.0.02+ Windows, Mac OS X, and Linux (Any)
6 ^ nltk 3.2.5+ Windows, Mac OS X, and Linux (Any)
7 witwidget 1.7.0+ Windows, Mac OS X, and Linux (Any)
8 ^ opencv-python 4.1.2.30+ Windows, Mac OS X, and Linux (Any)
8 ^ scikit-image 0.16.2+ Windows, Mac OS X, and Linux (Any)
8 tf-explain 0.2.1+ Windows, Mac OS X, and Linux (Any)
8 tf-keras-vis 0.5.5+ Windows, Mac OS X, and Linux (Any)
9 SALib 1.3.12+ Windows, Mac OS X, and Linux (Any)
9 distython 0.0.3+ Windows, Mac OS X, and Linux (Any)
10 ^ mlxtend 0.14.0+ Windows, Mac OS X, and Linux (Any)
10 sklearn-genetic 0.3.0+ Windows, Mac OS X, and Linux (Any)
11 aif360==0.3.0 Windows, Mac OS X, and Linux (Any)
11 BlackBoxAuditing==0.1.54 Windows, Mac OS X, and Linux (Any)
11 dowhy 0.5.1+ Windows, Mac OS X, and Linux (Any)
11 econml 0.9.0+ Windows, Mac OS X, and Linux (Any)
11 ^ networkx 2.5+ Windows, Mac OS X, and Linux (Any)
12 bayesian-optimization 1.2.0+ Windows, Mac OS X, and Linux (Any)
12 ^ graphviz 0.10.1+ Windows, Mac OS X, and Linux (Any)
12 tensorflow-lattice 2.0.7+ Windows, Mac OS X, and Linux (Any)
13 adversarial-robustness-toolbox 1.5.0+ Windows, Mac OS X, and Linux (Any)

NOTE: the library machine-learning-datasets is the official name of what in the book is referred to as mldatasets. Due to naming conflicts, it had to be changed.

The exact versions of each library, as tested, can be found in the requirements.txt file and installed like this should you have a dedicated environment for them:

> pip install -r requirements.txt

You might get some conflicts specifically with libraries cvae, alepython, pdpbox and xai. If this is the case, try:

> pip install --no-deps -r requirements.txt

Alternatively, you can install libraries one chapter at a time inside of a local Jupyter environment using cells with !pip install or run all the code in Google Colab with the following links:

Remember to make sure you click on the menu item "File > Save a copy in Drive" as soon you open each link to ensure that your notebook is saved as you run it. Also, notebooks denoted with plus sign (+) are relatively compute-intensive, and will take an extremely long time to run on Google Colab but if you must go to "Runtime > Change runtime type" and select "High-RAM" for runtime shape. Otherwise, a better cloud enviornment or local environment is preferable.

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Summary

The book does much more than explain technical topics, but here's a summary of the chapters:

Chapters topics

Related products

Get to Know the Authors

Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he's a Climate and Agronomic Data Scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a startup, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events. Whether it pertains to leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making — and machine learning interpretation helps bridge this gap more robustly.

Owner
Packt
Providing books, eBooks, video tutorials, and articles for IT developers, administrators, and users.
Packt
A flexible CTF contest platform for coming PKU GeekGame events

Project Guiding Star: the Backend A flexible CTF contest platform for coming PKU GeekGame events Still in early development Highlights Not configurabl

PKU GeekGame 14 Dec 15, 2022
Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

SDK: Overview of the Kubeflow pipelines service Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on

Kubeflow 3.1k Jan 06, 2023
Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc)

Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc). Structured a custom ensemble model and a neural network. Found a outperformed

Chris Yuan 1 Feb 06, 2022
Lightning ⚡️ fast forecasting with statistical and econometric models.

Nixtla Statistical ⚡️ Forecast Lightning fast forecasting with statistical and econometric models StatsForecast offers a collection of widely used uni

Nixtla 2.1k Dec 29, 2022
A Python implementation of the Robotics Toolbox for MATLAB

Robotics Toolbox for Python A Python implementation of the Robotics Toolbox for MATLAB® GitHub repository Documentation Wiki (examples and details) Sy

Peter Corke 1.2k Jan 07, 2023
dirty_cat is a Python module for machine-learning on dirty categorical variables.

dirty_cat dirty_cat is a Python module for machine-learning on dirty categorical variables.

637 Dec 29, 2022
The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

shibuiwilliam 9 Sep 09, 2022
Repositório para o #alurachallengedatascience1

1° Challenge de Dados - Alura A Alura Voz é uma empresa de telecomunicação que nos contratou para atuar como cientistas de dados na equipe de vendas.

Sthe Monica 16 Nov 10, 2022
easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

Neuron AI 5 Jun 18, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022
JMP is a Mixed Precision library for JAX.

Mixed precision training [0] is a technique that mixes the use of full and half precision floating point numbers during training to reduce the memory bandwidth requirements and improve the computatio

DeepMind 108 Dec 31, 2022
30 Days Of Machine Learning Using Pytorch

Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

Mayur 119 Nov 24, 2022
Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

GENDIS GENetic DIscovery of Shapelets In the time series classification domain, shapelets are small subseries that are discriminative for a certain cl

IDLab Services 90 Oct 28, 2022
A single Python file with some tools for visualizing machine learning in the terminal.

Machine Learning Visualization Tools A single Python file with some tools for visualizing machine learning in the terminal. This demo is composed of t

Bram Wasti 35 Dec 29, 2022
Iris-Heroku - Putting a Machine Learning Model into Production with Flask and Heroku

Puesta en Producción de un modelo de aprendizaje automático con Flask y Heroku L

Jesùs Guillen 1 Jun 03, 2022
A simple machine learning package to cluster keywords in higher-level groups.

Simple Keyword Clusterer A simple machine learning package to cluster keywords in higher-level groups. Example: "Senior Frontend Engineer" -- "Fronte

Andrea D'Agostino 10 Dec 18, 2022
This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing variance.

minvar_invest_portfolio This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing var

1 Jan 06, 2022
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

Horovod 12.9k Jan 07, 2023
Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

Time series analysis today is an important cornerstone of quantitative science in many disciplines, including natural and life sciences as well as eco

Christoph Mark 129 Dec 24, 2022
A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching.

A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching. The solver will solve equations of the type: A can be

Sanjeet N. Dasharath 3 Feb 15, 2022