Interpretability and explainability of data and machine learning models

Overview

AI Explainability 360 (v0.2.1)

Build Status Documentation Status PyPI version

The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datasets and machine learning models. The AI Explainability 360 Python package includes a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics.

The AI Explainability 360 interactive experience provides a gentle introduction to the concepts and capabilities by walking through an example use case for different consumer personas. The tutorials and example notebooks offer a deeper, data scientist-oriented introduction. The complete API is also available.

There is no single approach to explainability that works best. There are many ways to explain: data vs. model, directly interpretable vs. post hoc explanation, local vs. global, etc. It may therefore be confusing to figure out which algorithms are most appropriate for a given use case. To help, we have created some guidance material and a chart that can be consulted.

We have developed the package with extensibility in mind. This library is still in development. We encourage the contribution of your explainability algorithms and metrics. To get started as a contributor, please join the AI Explainability 360 Community on Slack by requesting an invitation here. Please review the instructions to contribute code here.

Supported explainability algorithms

Data explanation

Local post-hoc explanation

Local direct explanation

Global direct explanation

Global post-hoc explanation 

Supported explainability metrics

Setup

Supported Configurations:

OS Python version
macOS 3.6
Ubuntu 3.6
Windows 3.6

(Optional) Create a virtual environment

AI Explainability 360 requires specific versions of many Python packages which may conflict with other projects on your system. A virtual environment manager is strongly recommended to ensure dependencies may be installed safely. If you have trouble installing the toolkit, try this first.

Conda

Conda is recommended for all configurations though Virtualenv is generally interchangeable for our purposes. Miniconda is sufficient (see the difference between Anaconda and Miniconda if you are curious) and can be installed from here if you do not already have it.

Then, to create a new Python 3.6 environment, run:

conda create --name aix360 python=3.6
conda activate aix360

The shell should now look like (aix360) $. To deactivate the environment, run:

(aix360)$ conda deactivate

The prompt will return back to $ or (base)$.

Note: Older versions of conda may use source activate aix360 and source deactivate (activate aix360 and deactivate on Windows).

Installation

Clone the latest version of this repository:

(aix360)$ git clone https://github.com/Trusted-AI/AIX360

If you'd like to run the examples and tutorial notebooks, download the datasets now and place them in their respective folders as described in aix360/data/README.md.

Then, navigate to the root directory of the project which contains setup.py file and run:

(aix360)$ pip install -e .

Using AI Explainability 360

The examples directory contains a diverse collection of jupyter notebooks that use AI Explainability 360 in various ways. Both examples and tutorial notebooks illustrate working code using the toolkit. Tutorials provide additional discussion that walks the user through the various steps of the notebook. See the details about tutorials and examples here.

Citing AI Explainability 360

A technical description of AI Explainability 360 is available in this paper. Below is the bibtex entry for this paper.

@misc{aix360-sept-2019,
title = "One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques",
author = {Vijay Arya and Rachel K. E. Bellamy and Pin-Yu Chen and Amit Dhurandhar and Michael Hind
and Samuel C. Hoffman and Stephanie Houde and Q. Vera Liao and Ronny Luss and Aleksandra Mojsilovi\'c
and Sami Mourad and Pablo Pedemonte and Ramya Raghavendra and John Richards and Prasanna Sattigeri
and Karthikeyan Shanmugam and Moninder Singh and Kush R. Varshney and Dennis Wei and Yunfeng Zhang},
month = sept,
year = {2019},
url = {https://arxiv.org/abs/1909.03012}
}

AIX360 Videos

  • Introductory video to AI Explainability 360 by Vijay Arya and Amit Dhurandhar, September 5, 2019 (35 mins)

Acknowledgements

AIX360 is built with the help of several open source packages. All of these are listed in setup.py and some of these include:

License Information

Please view both the LICENSE file and the folder supplementary license present in the root directory for license information.

Comments
  • ProtoDash: local variable 'newinnerProduct' referenced before assignment

    ProtoDash: local variable 'newinnerProduct' referenced before assignment

    I am using the HELOC Dataset and trying to explain a single test instance using prototypes from my training subset using below code:

    explainer = ProtodashExplainer() (W, S, _) = explainer.explain(dfTrain.to_numpy(), dfTest.iloc[0:1,:].to_numpy(), m=2)

    However, I am getting below error: Screen Shot 2020-05-22 at 01 58 42

    Is this intentional? Please help.

    Thank you

    opened by laramdemajo 8
  • Add rule induction algorithms

    Add rule induction algorithms

    Includes the Ripper algorithm and TRXF ruleset exchange format.

    The rule_induction directory will eventually contain a set of closely related algorithms (drop-in replacements) that are used to induce and export rule set in the common TRXF format for consumption by AIMEE, ADS, and RedHat Decision Manager, etc. This originates from the internal aix360i:ripper branch, and we intend to migrate parts of that code as the quality bar is achieved.

    In particular, as previously discussed with @vijay-arya, the reason for this migration to the public repo is to provide the more technical ADS (CP4BA) clients with the means to programmatically generate their own rule sets without relying on the AIMEE GUI.

    Required dependencies:

    • numpy
    • pandas
    • sklearn
    • nyoka
    • xmltodict
    • numba
    opened by kmyusk 7
  • ExternalRiskEstimate seems to be hard coded into HELOC data processing, but I cannot find it.

    ExternalRiskEstimate seems to be hard coded into HELOC data processing, but I cannot find it.

    Screen Shot 2021-05-19 at 11 51 10 AM

    If I change the name:


    ValueError Traceback (most recent call last) in 2 from aix360.algorithms.rbm import FeatureBinarizer 3 fb = FeatureBinarizer(negations=True, returnOrd=True) ----> 4 dfTrain, dfTrainStd = fb.fit_transform(dfTrain) 5 dfTest, dfTestStd = fb.transform(dfTest) 6 dfTrain['MostRecentBillAmountRaw'].head()

    ~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params) 697 if y is None: 698 # fit method of arity 1 (unsupervised transformation) --> 699 return self.fit(X, **fit_params).transform(X) 700 else: 701 # fit method of arity 2 (supervised transformation)

    ~/PycharmProjects/AIX360/aix360/algorithms/rbm/features.py in fit(self, X) 111 self.ordinal = ordinal 112 # Fit StandardScaler to ordinal features --> 113 self.scaler = StandardScaler().fit(data[ordinal]) 114 return self 115

    ~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/preprocessing/_data.py in fit(self, X, y, sample_weight) 728 # Reset internal state before fitting 729 self._reset() --> 730 return self.partial_fit(X, y, sample_weight) 731 732 def partial_fit(self, X, y=None, sample_weight=None):

    ~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/preprocessing/_data.py in partial_fit(self, X, y, sample_weight) 766 X = self._validate_data(X, accept_sparse=('csr', 'csc'), 767 estimator=self, dtype=FLOAT_DTYPES, --> 768 force_all_finite='allow-nan', reset=first_call) 769 n_features = X.shape[1] 770

    ~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params) 419 out = X 420 elif isinstance(y, str) and y == 'no_validation': --> 421 X = check_array(X, **check_params) 422 out = X 423 else:

    ~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs) 61 extra_args = len(args) - len(all_args) 62 if extra_args <= 0: ---> 63 return f(*args, **kwargs) 64 65 # extra_args > 0

    ~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator) 538 539 if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig): --> 540 dtype_orig = np.result_type(*dtypes_orig) 541 542 if dtype_numeric:

    <array_function internals> in result_type(*args, **kwargs)

    ValueError: at least one array or dtype is required

    opened by BrianBlackman 5
  • beam_search_K1 with pandas > 1.1.0

    beam_search_K1 with pandas > 1.1.0

    With pandas version > 1.1.0, the line 148 (145, and 150) return an error: ValueError: cannot reindex from a duplicate axis. Locally, locally I just added '.values'. Example of line 148: colKeep[i[0]] = ((Xp[i[0]].columns.get_level_values(0) == '<=') & (thresh > i[2])).values

    opened by Hugomiralles 5
  • Fixing the error occured when using negated binary columns with FeatureBinarizer

    Fixing the error occured when using negated binary columns with FeatureBinarizer

    Issue: #112

    I've just noticed that the FeatureBinarizer, when including the negated columns as well, does not work when using a dataset where there is a binary categorical feature. ~~That's probably another Pandas version error, where the 1.0.0 or newer Pandas versions work significantly different than they previously did.~~ (Got the error using Pandas 0.25.3)

    When calling fb.fit_transform(<dataset_with_binary_category>, negations=True) The error message was: TypeError: unsupported operand type(s) for -: 'int' and 'Categorical'
    At line 142. in function transform(): A[(str(c), 'not', '')] = 1 - A[(str(c), '', '')]
    where A[(str(c), '', '')] = data[c].map(maps[c]) and c is a specific column

    At that line the substraction does not work, because the Series A[(str(c), '', '')] is categorical.

    Solution:
    For a solution just convert the type of A[(str(c), '', '')] to integer as A[(str(c), '', '')] = data[c].map(maps[c]).astype(int). Although it could be solvable in many formats, I've seen the pattern astype(int) elsewhere in the codebase, so I hope that the solution is satisfactory.

    opened by gaborpelesz 4
  • BRCG train fails in copied

    BRCG train fails in copied "Credit Approval Tutorial" code

    Hi there,

    I've actually copied the code (did no modification at all) from the BRCG part of the "Credit Approval Tutorial" code and ran into errors. I'm quite sure that the dataset was loaded appropriately, as I have also trained a scikit learn Decision Tree Classifier on it with no problem and in the same notebook.

    Can someone help me with this issue? Am I missing something or is it an internal problem?

    Thanks in advance!

    Here is the code and the output. It was run on google colab, with pandas 1.1.2 and the latest aix360 release, which is 0.2.0.

    Copied code

    import warnings
    warnings.filterwarnings('ignore')
    
    # Load FICO HELOC data with special values converted to np.nan
    from aix360.datasets.heloc_dataset import HELOCDataset, nan_preprocessing
    data = HELOCDataset(custom_preprocessing=nan_preprocessing).data()
    # Separate target variable
    y = data.pop('RiskPerformance')
    
    # Split data into training and test sets using fixed random seed
    from sklearn.model_selection import train_test_split
    dfTrain, dfTest, yTrain, yTest = train_test_split(data, y, random_state=0, stratify=y)
    dfTrain.head().transpose()
    
    # Binarize data and also return standardized ordinal features
    from aix360.algorithms.rbm import FeatureBinarizer
    fb = FeatureBinarizer(negations=True, returnOrd=True)
    dfTrain, dfTrainStd = fb.fit_transform(dfTrain)
    dfTest, dfTestStd = fb.transform(dfTest)
    dfTrain['ExternalRiskEstimate'].head()
    
    # Instantiate BRCG with small complexity penalty and large beam search width
    from aix360.algorithms.rbm import BooleanRuleCG
    br = BooleanRuleCG(lambda0=1e-3, lambda1=1e-3, CNF=True)
    
    # Train, print, and evaluate model
    br.fit(dfTrain, yTrain)
    from sklearn.metrics import accuracy_score
    print('Training accuracy:', accuracy_score(yTrain, br.predict(dfTrain)))
    print('Test accuracy:', accuracy_score(yTest, br.predict(dfTest)))
    print('Predict Y=0 if ANY of the following rules are satisfied, otherwise Y=1:')
    print(br.explain()['rules'])
    

    Output

    Learning CNF rule with complexity parameters lambda0=0.001, lambda1=0.001
    Initial LP solved
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __setitem__(self, key, value)
       1001         try:
    -> 1002             self._set_with_engine(key, value)
       1003         except (KeyError, ValueError):
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/series.py in _set_with_engine(self, key, value)
       1032         # fails with AttributeError for IntervalIndex
    -> 1033         loc = self.index._engine.get_loc(key)
       1034         validate_numeric_casting(self.dtype, value)
    
    pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc()
    
    KeyError: 'ExternalRiskEstimate'
    
    During handling of the above exception, another exception occurred:
    
    ValueError                                Traceback (most recent call last)
    <ipython-input-98-8d81fbd6c0e1> in <module>()
         26 
         27 # Train, print, and evaluate model
    ---> 28 br.fit(dfTrain, yTrain)
         29 from sklearn.metrics import accuracy_score
         30 print('Training accuracy:', accuracy_score(yTrain, br.predict(dfTrain)))
    
    /usr/local/lib/python3.6/dist-packages/aix360/algorithms/rbm/boolean_rule_cg.py in fit(self, X, y)
        118         UB = min(UB.min(), 0)
        119         v, zNew, Anew = beam_search(r, X, self.lambda0, self.lambda1,
    --> 120                                     K=self.K, UB=UB, D=self.D, B=self.B, eps=self.eps)
        121 
        122         while (v < -self.eps).any() and (self.it < self.iterMax):
    
    /usr/local/lib/python3.6/dist-packages/aix360/algorithms/rbm/beam_search.py in beam_search(r, X, lambda0, lambda1, K, UB, D, B, wLB, eps, stopEarly)
        285             if i[1] == '<=':
        286                 thresh = Xp[i[0]].columns.get_level_values(1).to_series().replace('NaN', np.nan)
    --> 287                 colKeep[i[0]] = (Xp[i[0]].columns.get_level_values(0) == '>') & (thresh < i[2])
        288             elif i[1] == '>':
        289                 thresh = Xp[i[0]].columns.get_level_values(1).to_series().replace('NaN', np.nan)
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __setitem__(self, key, value)
       1008             else:
       1009                 # GH#12862 adding an new key to the Series
    -> 1010                 self.loc[key] = value
       1011 
       1012         except TypeError as e:
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in __setitem__(self, key, value)
        668 
        669         iloc = self if self.name == "iloc" else self.obj.iloc
    --> 670         iloc._setitem_with_indexer(indexer, value)
        671 
        672     def _validate_key(self, key, axis: int):
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
       1790                 # setting for extensionarrays that store dicts. Need to decide
       1791                 # if it's worth supporting that.
    -> 1792                 value = self._align_series(indexer, Series(value))
       1793 
       1794             elif isinstance(value, ABCDataFrame):
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _align_series(self, indexer, ser, multiindex_indexer)
       1909             # series, so need to broadcast (see GH5206)
       1910             if sum_aligners == self.ndim and all(is_sequence(_) for _ in indexer):
    -> 1911                 ser = ser.reindex(obj.axes[0][indexer[0]], copy=True)._values
       1912 
       1913                 # single indexer
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/series.py in reindex(self, index, **kwargs)
       4397     )
       4398     def reindex(self, index=None, **kwargs):
    -> 4399         return super().reindex(index=index, **kwargs)
       4400 
       4401     def drop(
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
       4457         # perform the reindex on the axes
       4458         return self._reindex_axes(
    -> 4459             axes, level, limit, tolerance, method, fill_value, copy
       4460         ).__finalize__(self, method="reindex")
       4461 
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
       4480                 fill_value=fill_value,
       4481                 copy=copy,
    -> 4482                 allow_dups=False,
       4483             )
       4484 
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
       4525                 fill_value=fill_value,
       4526                 allow_dups=allow_dups,
    -> 4527                 copy=copy,
       4528             )
       4529             # If we've made a copy once, no need to make another one
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate)
       1274         # some axes don't allow reindexing with dups
       1275         if not allow_dups:
    -> 1276             self.axes[axis]._can_reindex(indexer)
       1277 
       1278         if axis >= self.ndim:
    
    /usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer)
       3283         # trying to reindex on an axis with duplicates
       3284         if not self.is_unique and len(indexer):
    -> 3285             raise ValueError("cannot reindex from a duplicate axis")
       3286 
       3287     def reindex(self, target, method=None, level=None, limit=None, tolerance=None):
    
    ValueError: cannot reindex from a duplicate axis
    
    opened by gaborpelesz 4
  • ValueError with using ProtoDash to get the Prototypes of a Dataset

    ValueError with using ProtoDash to get the Prototypes of a Dataset

    Hi!

    I'm encountering an error with a simple use case of ProtoDash to get prototypes of a given dataset. Here's an example that triggers the error:

    import pandas as pd
    from sklearn import datasets
    from aix360.algorithms.protodash import PDASH
    
    # Load Iris
    X, y = datasets.load_iris(True)
    df = pd.DataFrame(X, columns=range(X.shape[1]))
    df['y'] = y
    
    tmp = df[df['y'] == 0].drop('y', axis=1).values
    X_1 = PDASH.HeuristicSetSelection(X=tmp, Y=tmp, m=10, kernelType='gaussian', sigma=2)
    
    # This generates an error:
    # ---------------------------------------------------------------------------
    # ValueError                                Traceback (most recent call last)
    # <ipython-input-48-e631ba33f62a> in <module>
    #      1 tmp = df[df['y'] == 0].drop('y', axis=1).values
    # ----> 2 X_1 = PDASH.HeuristicSetSelection(X=tmp, Y=tmp, m=10, kernelType='gaussian', sigma=2)
    #
    # c:\users\pc\aix\aix360\aix360\algorithms\protodash\PDASH_utils.py in HeuristicSetSelection(X, Y, m, kernelType, sigma)
    #    267             currK = K2
    #    268             if maxGradient <= 0:
    #--> 269                 newCurrOptw = np.vstack((currOptw[:], np.array([0])))
    #    270                 newCurrSetValue = currSetValue
    #    271             else:
    #
    #~\AppData\Local\Continuum\anaconda3\envs\aix360\lib\site-packages\numpy\core\shape_base.py in vstack(tup)
    #    281     """
    #    282     _warn_for_nonsequence(tup)
    #--> 283     return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
    #    284 
    #    285 
    #
    #ValueError: all the input array dimensions except for the concatenation axis must match exactly
    

    Interestingly, the error does not pop up for m < 10.

    Is this a bug or am I using it incorrectly?

    Thanks,

    opened by hadrianpaulo 4
  • ModuleNotFoundError: No module named 'aix360.algorithms.rule_induction'

    ModuleNotFoundError: No module named 'aix360.algorithms.rule_induction'

    Hi,

    I tried to run an example notebook via Docker.

    I followed the instructions to build the Docker image and run the Jupyter server here

    However, when I run the notebook examples/rule_induction/brcg_demo.ipynb

    Importing the dependent libs simply gave:

    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    <ipython-input-3-33e99aef874b> in <module>
          3 from sklearn.model_selection import train_test_split
          4 from sklearn.metrics import precision_score, recall_score, accuracy_score, balanced_accuracy_score
    ----> 5 from aix360.algorithms.rule_induction.rbm.boolean_rule_cg import BooleanRuleCG as BRCG
          6 from aix360.algorithms.rbm import FeatureBinarizer
          7 import time
    
    ModuleNotFoundError: No module named 'aix360.algorithms.rule_induction'
    
    opened by xiaohan2012 3
  • CEMExplainer scikit-learn compatibility &

    CEMExplainer scikit-learn compatibility & "predict_long" method error

    Hello! I've got a problem you might be able to help me with.

    I'm trying to use CEM to explain scikit-learn binary classification models. I'm using RandomForestClassifier & KNeighborsClassifier to be more exact.

    Here's a snipet using K-Neighbors:

    from sklearn.neighbors import KNeighborsClassifier
    
    model_kneighbors = KNeighborsClassifier()
    
    fit_kneighbors = model_kneighbors.fit(X_train, y_train_)
    y_pred_kneighbors = model_kneighbors.predict(X_test)
    
    cem_kneighbors_clas = CEMExplainer(model_kneighbors)
    
    arg_mode = 'PN'             # Find pertinent negatives
    arg_max_iter = 1000         # Maximum number of iterations to search for the optimal PN for given parameter settings
    arg_init_const = 10.0       # Initial coefficient value for main loss term that encourages class change
    arg_b = 9                   # No. of updates to the coefficient of the main loss term
    arg_kappa = 0.2             # Minimum confidence gap between the PNs (changed) class probability and original class' probability
    arg_beta = 1e-1             # Controls sparsity of the solution (L1 loss)
    arg_gamma = 100             # Controls how much to adhere to a (optionally trained) auto-encoder
    my_AE_model = None          # Pointer to an auto-encoder
    arg_alpha = 0.01            # Penalizes L2 norm of the solution
    arg_threshold = 1.          # Automatically turn off features <= arg_threshold if arg_threshold < 1
    arg_offset = 0.5            # the model assumes classifier trained on data normalized
                                # in [-arg_offset, arg_offset] range, where arg_offset is 0 or 0.5
    
    (adv_pn, delta_pn, info_pn) = cem_kneighbors_clas.explain_instance(
        input_X=X_to_explain_clas,      # input_X (numpy.ndarray) – input instance to be explained
        arg_mode=arg_mode,              # arg_mode (str) – ‘PP’ or ‘PN’
        AE_model=my_AE_model,           # AE_model – Auto-encoder model
        arg_kappa=arg_kappa,            # arg_kappa (double) – Confidence gap between desired class and other classes
        arg_b=arg_b,                    # arg_b (double) – Number of different weightings of loss function to try
        arg_max_iter=arg_max_iter,      # arg_max_iter (int) – For each weighting of loss function number of iterations to search
        arg_init_const=arg_init_const,  # arg_init_const (double) – Initial weighting of loss function
        arg_beta=arg_beta,              # arg_beta (double) – Weighting of L1 loss
        arg_gamma=arg_gamma             # arg_gamma (double) – Weighting of auto-encoder                  
    )
    

    When I try to run it I get: 'KNeighborsClassifier' object has no attribute 'predict_long'. I checked out the CEM implementation and found the predict_long call. I also tried the HELOC tutorial example and it all works well, but with a classifier named KerasClassifier, that I think has been deprecated? I can't find documentation for it anywhere.

    Based on this, I've got a few questions:

    1. What estimators are compatible with CEM? The other classes are compatible with scikit-learn, but perhaps CEM is for NNs or TensorFlow only? I skimmed the paper but didn't quite catch that. I know there's a variation that deals with images, but "regular" CEM should work for tabular data, correct? Since ProtoDash is compatible with these estimators, I assumed CEM was too.
    2. Perhaps I simply need to downgrade the package? I didn't install anything in particular in my Jupyter Notebook, just aix360 (didn't specify a version either, just ran pip install aix360).
    3. If I were to "tweak" the implementation and change predict_long for the usual scikit-learn predict method, would it work? I don't fully understand the code, to be honest.
    4. If scikit-learn estimators aren't supported for CEM at the moment, is there an implementation planned for the future?

    Superb work with the library, by the way. I'm a huge fan of IBM Research and the amazing work you all do.

    Regards from a fellow IBMer!

    Kindly, Josefina

    opened by josefinarcasanova 3
  • PMML export enhancements and 3.8-3.6 compatibility of rule induction code

    PMML export enhancements and 3.8-3.6 compatibility of rule induction code

    • Categorical datafields with str type now includes the list of possible/legal values in the PMML file.
    • Fixed one test that was failing in python 3.8 in rule induction

    More details in the individual commit messages.

    Edit: It was 3.8 not 3.7

    opened by kmyusk 2
  • Add Matching Explainer Algorithm

    Add Matching Explainer Algorithm

    Summary

    Adding a White Box Explainer for Matchings, as described in the upcoming ICML 2022 publication.

    • the below algorithm is implemented in a python package, which is imported into AIX360

    Fabian Lim, Laura Wynter, Shiau Hong Lim. 2022. "Order Constraints in Optimal Transport". https://arxiv.org/abs/2110.07275.

    Algorithm

    Given a matching, provide an explaination in terms of returning alternate matchings that each focus on a sparse set of salient matches.

    • inherit from the LocalWBExplainer since it requires access to the internal coefficients of a matching (white box) and does not require retraining over a dataset (locality)

    Package Dependencies

    In setup.py we have the following dependency that is installed as an egg

    In examples/matching/matching-pairs-of-sentences.ipynb we ask the user to install the below packages in order to execute the demo

    • POT==0.7.0

    Example

    An NLP-based example inspired from one of the figures in the paper is provided

    TODO

    • [x] algorithm has been added in here.
    • [x] semantic dataset used in examples has been included in here.
    • [x] examples has been included in here. With the following 2 subdirectories
      • data: storing NLP embeddings that will be downloaded for the demo
      • models: contains code for an NLP embedding model used in demo
      • utils: utility functions for the demo
    • [x] docs have been updated. Sphinx has been run locally and tested.
    • [x] Update READMEs in examples, top-level directory, examples, etc
    • [x] tests has been included in here.
      • data: each test case is specified in a .json file and stored here.
    • [x] Update setup.py. The python package is hosted publically at https://github.com/IBM/otoc.
      • [x] This repo is placed in install_requires as an egg link
      • [x] Update the GitHub link in the top-level README
    • [ ] Discuss with @vijay-arya on where to move the below items
      • examples/matching/data
      • examples/matching/models
      • tests/matching/data
    opened by fabianlim 2
  • Error:

    Error: "elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison"

    When trying to execute protodash, getting the error: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison. Any reason why?

    opened by survivebycoding 0
  • Add CodeQL workflow for GitHub code scanning

    Add CodeQL workflow for GitHub code scanning

    Hi Trusted-AI/AIX360!

    This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that we’ve integrated LGTM’s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

    With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

    This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

    Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

    Questions? Check out the FAQ below!

    FAQ

    Click here to expand the FAQ section

    How often will the code scanning analysis run?

    By default, code scanning will trigger a scan with the CodeQL engine on the following events:

    • On every pull request — to flag up potential security problems for you to investigate before merging a PR.
    • On every push to your default branch and other protected branches — this keeps the analysis results on your repository’s Security tab up to date.
    • Once a week at a fixed time — to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

    What will this cost?

    Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

    What types of problems does CodeQL find?

    The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

    How do I upgrade my CodeQL engine?

    No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

    The analysis doesn’t seem to be working

    If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

    How do I disable LGTM.com?

    If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

    Which source code hosting platforms does code scanning support?

    GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

    How do I know this PR is legitimate?

    This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

    I have another question / how do I get in touch?

    Please join the discussion here to ask further questions and send us suggestions!

    opened by lgtm-com[bot] 0
  • TRXF pmml scorecard reader

    TRXF pmml scorecard reader

    Implemented the reader portion of the scorecard pmml export functionality. Together with the writer portion https://github.com/Trusted-AI/AIX360/pull/166, exporting TRXF scorecards to pmml is possible.

    opened by kmyusk 1
  • Questions about the results obtained by XAI method

    Questions about the results obtained by XAI method

    I found a strange phenomenon. For the same model, the same training sample and test sample, other operations are identical. Theoretically, the values obtained by using the XAI method (like Saliency) to evaluate the interpretability of the model should be the same. However, I retrained a new model, and the interpretability values obtained are completely different from those obtained from the previous model. Does anyone know why this happens? The interpretability value is completely unstable, and the results cannot be reproduced. Unless I completely save this model after training it, and then reload this parameter, the results will be the same. Does anyone know why

    opened by 9527-ly 0
  • Ripper rule induction algorithm treats timestamp type features as categorical

    Ripper rule induction algorithm treats timestamp type features as categorical

    Ripper algorithm recognizes timestamp features (e.g. 2022-06-14-19.39.35.929641) as integers, and thus encodes them to categorical features. The resulting rules are in terms of equality predicates (e.g. timestamp == 2022-06-14-19.39.35.929641) instead of intervals/inequalities as one would expect.

    Proper timestamp type support for Ripper would be nice.

    opened by kmyusk 0
Releases(v0.2.1)
  • v0.2.1(Oct 28, 2020)

    • Minor update to CEM parameters
    • FeatureBinarizerFromTrees for Directly interpretable explainers
    • Minor updates to BRCG due to Pandas update
    • Updates to Heloc tutorial
    • Abstraction class for global black box
    • comment updates to protodash
    • Minor bug fixes
    • License updates
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Dec 9, 2019)

Owner
This GitHub org hosts LF AI Foundation projects in the category of Trusted and Responsible AI.
TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, Korean, Chinese, German and Easy to adapt for other languages)

🤪 TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we c

3k Jan 04, 2023
Contrastive Explanation (Foil Trees), developed at TNO/Utrecht University

Contrastive Explanation (Foil Trees) Contrastive and counterfactual explanations for machine learning (ML) Marcel Robeer (2018-2020), TNO/Utrecht Univ

M.J. Robeer 41 Aug 29, 2022
Pytorch implementation of convolutional neural network visualization techniques

Convolutional Neural Network Visualizations This repository contains a number of convolutional neural network visualization techniques implemented in

Utku Ozbulak 7k Jan 03, 2023
Visual analysis and diagnostic tools to facilitate machine learning model selection.

Yellowbrick Visual analysis and diagnostic tools to facilitate machine learning model selection. What is Yellowbrick? Yellowbrick is a suite of visual

District Data Labs 3.9k Dec 30, 2022
Lucid library adapted for PyTorch

Lucent PyTorch + Lucid = Lucent The wonderful Lucid library adapted for the wonderful PyTorch! Lucent is not affiliated with Lucid or OpenAI's Clarity

Lim Swee Kiat 520 Dec 26, 2022
JittorVis - Visual understanding of deep learning model.

JittorVis - Visual understanding of deep learning model.

182 Jan 06, 2023
⬛ Python Individual Conditional Expectation Plot Toolbox

⬛ PyCEbox Python Individual Conditional Expectation Plot Toolbox A Python implementation of individual conditional expecation plots inspired by R's IC

Austin Rochford 140 Dec 30, 2022
A game theoretic approach to explain the output of any machine learning model.

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allo

Scott Lundberg 18.3k Jan 08, 2023
Code for "High-Precision Model-Agnostic Explanations" paper

Anchor This repository has code for the paper High-Precision Model-Agnostic Explanations. An anchor explanation is a rule that sufficiently “anchors”

Marco Tulio Correia Ribeiro 735 Jan 05, 2023
ModelChimp is an experiment tracker for Deep Learning and Machine Learning experiments.

ModelChimp What is ModelChimp? ModelChimp is an experiment tracker for Deep Learning and Machine Learning experiments. ModelChimp provides the followi

ModelChimp 124 Dec 21, 2022
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)

Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)

Jesse Vig 4.7k Jan 01, 2023
python partial dependence plot toolbox

PDPbox python partial dependence plot toolbox Motivation This repository is inspired by ICEbox. The goal is to visualize the impact of certain feature

Li Jiangchun 722 Dec 30, 2022
A collection of infrastructure and tools for research in neural network interpretability.

Lucid Lucid is a collection of infrastructure and tools for research in neural network interpretability. We're not currently supporting tensorflow 2!

4.5k Jan 07, 2023
GNNLens2 is an interactive visualization tool for graph neural networks (GNN).

GNNLens2 is an interactive visualization tool for graph neural networks (GNN).

Distributed (Deep) Machine Learning Community 143 Jan 07, 2023
An Empirical Review of Optimization Techniques for Quantum Variational Circuits

QVC Optimizer Review Code for the paper "An Empirical Review of Optimization Techniques for Quantum Variational Circuits". Each of the python files ca

Owen Lockwood 5 Jun 28, 2022
Portal is the fastest way to load and visualize your deep neural networks on images and videos 🔮

Portal is the fastest way to load and visualize your deep neural networks on images and videos 🔮

Datature 243 Jan 05, 2023
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX, TensorFlow Lite, Keras, Caffe, Darknet, ncnn,

Lutz Roeder 20.9k Dec 28, 2022
👋🦊 Xplique is a Python toolkit dedicated to explainability, currently based on Tensorflow.

👋🦊 Xplique is a Python toolkit dedicated to explainability, currently based on Tensorflow.

DEEL 343 Jan 02, 2023
A python library for decision tree visualization and model interpretation.

dtreeviz : Decision Tree Visualization Description A python library for decision tree visualization and model interpretation. Currently supports sciki

Terence Parr 2.4k Jan 02, 2023
treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions.

TreeInterpreter Package for interpreting scikit-learn's decision tree and random forest predictions. Allows decomposing each prediction into bias and

Ando Saabas 720 Dec 22, 2022