Detecting silent model failure. NannyML estimates performance with an algorithm called Confidence-based Performance estimation (CBPE), developed by core contributors. It is the only open-source algorithm capable of fully capturing the impact of data drift on performance.

Last update: Jan 08, 2023

Overview

💡 What is NannyML?

NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, interactive visualizations, is completely model-agnostic and currently supports all tabular classification use cases.

The core contributors of NannyML have researched and developed a novel algorithm for estimating model performance: confidence-based performance estimation (CBPE). The nansters also invented a new approach to detect multivariate data drift using PCA-based data reconstruction.

If you like what we are working on, be sure to become a Nanster yourself, join our community slack and support us with a GitHub star ⭐ .

☔ Why use NannyML?

NannyML closes the loop with performance monitoring and post deployment data science, empowering data scientist to quickly understand and automatically detect silent model failure. By using NannyML, data scientists can finally maintain complete visibility and trust in their deployed machine learning models. Allowing you to have the following benefits:

End sleepless nights caused by not knowing your model performance 😴
Analyse data drift and model performance over time
Discover the root cause to why your models are not performing as expected
No alert fatigue! React only when necessary if model performance is impacted
Painless setup in any environment

🧠 GO DEEP

NannyML Resources	Description
☎️ NannyML 101	New to NannyML? Start here!
🔮 Performance estimation	How the magic works.
🌍 Real world example	Take a look at a real-world example of NannyML.
🔑 Key concepts	Glossary of key concepts we use.
🔬 Technical reference	Monitor the performance of your ML models.
🔎 Blog	Thoughts on post-deployment data science from the NannyML team.
📬 Newsletter	All things post-deployment data science. Subscribe to see the latest papers and blogs.
💎 New in v0.4.0	New features, bug fixes.
🧑‍💻 Contribute	How to contribute to the NannyML project and codebase.
Join slack	Need help with your specific use case? Say hi on slack!

🔱 Features

1. Performance estimation and monitoring

When the actual outcome of your deployed prediction models is delayed, or even when post-deployment target labels are completely absent, you can use NannyML's CBPE-algorithm to estimate model performance. This algorithm requires the predicted probabilities of your machine learning model and leverages probability calibration to estimate any traditional binary classification metric (ROC_AUC, Precision, Recall, F1, etc.). Rather than estimating the performance of future model predictions, CBPE estimates the expected model performance of the predictions made at inference time.

NannyML can also track the realised performance of your machine learning model once targets are available.

2. Data drift detection

To detect multivariate feature drift NannyML uses PCA-based data reconstruction. Changes in the resulting reconstruction error are monitored over time and data drift alerts are logged when the reconstruction error in a certain period exceeds a threshold. This threshold is calculated based on the reconstruction error observed in the reference period.

NannyML utilises statistical tests to detect univariate feature drift. The Kolmogorov–Smirnov test is used for continuous features and the 2-sample chi-squared test for categorical features. The results of these tests are tracked over time, properly corrected to counteract multiplicity and overlayed on the temporal feature distributions. (It is also possible to visualise the test-statistics over time, to get a notion of the drift magnitude.)

NannyML uses the same statistical tests to detected model output drift.

Target distribution drift is monitored by calculating the mean occurrence of positive events in combination with the 2-sample chi-squared test. Bear in mind that this operation requires the presence of actuals.

3. Intelligent alerting

Because NannyML can estimate performance, it is possible to weed out data drift alerts that do not impact expected performance, combatting alert fatigue. Besides linking data drift issues to drops in performance it is also possible to prioritise alerts according to other criteria using NannyML's Ranker.

🚀 Getting started

Install NannyML

From PyPI:

pip install nannyml

Here be dragons! Use the latest development version of NannyML at your own risk:

python -m pip install git+https://github.com/NannyML/nannyml

Quick Start

The following snippet is based on our latest release.

import pandas as pd
import nannyml as nml

# Load dummy data
reference, analysis, analysis_target = nml.load_synthetic_binary_classification_dataset()
data = pd.concat([reference, analysis], ignore_index=True)

# Extract meta data
metadata = nml.extract_metadata(data = reference, model_type='classification_binary', exclude_columns=['identifier'])
metadata.target_column_name = 'work_home_actual'

# Choose a chunker or set a chunk size
chunk_size = 5000

# Estimate model performance
estimator = nml.CBPE(model_metadata=metadata, metrics=['roc_auc'], chunk_size=chunk_size)
estimator.fit(reference)
estimated_performance = estimator.estimate(data=data)

figure = estimated_performance.plot(metric='roc_auc', kind='performance')
figure.show()

# Detect multivariate feature drift
multivariate_calculator = nml.DataReconstructionDriftCalculator(model_metadata=metadata, chunk_size=chunk_size)
multivariate_calculator.fit(reference_data=reference)
multivariate_results = multivariate_calculator.calculate(data=data)

figure = multivariate_results.plot(kind='drift')
figure.show()

# Detect univariate feature drift
univariate_calculator = nml.UnivariateStatisticalDriftCalculator(model_metadata=metadata, chunk_size=chunk_size)
univariate_calculator.fit(reference_data=reference)
univariate_results = univariate_calculator.calculate(data=data)

# Rank features based on number of alerts
ranker = nml.Ranker.by('alert_count')
ranked_features = ranker.rank(univariate_results, model_metadata=metadata, only_drifting = False)

for feature in ranked_features.feature:
    figure = univariate_results.plot(kind='feature_distribution', feature_label=feature)
    figure.show()

📖 Documentation

Performance monitoring
- Estimated performance
- Realized performance
Drift detection
- Multivariate feature drift

🦸 Contributing and Community

We want to build NannyML together with the community! The easiest to contribute at the moment is to propose new features or log bugs under issues. For more information, have a look at how to contribute.

🙋 Get help

The best place to ask for help is in the community slack. Feel free to join and ask questions or raise issues. Someone will definitely respond to you.

🥷 Stay updated

If you want to stay up to date with recent changes to the NannyML library, you can subscribe to our release notes. For thoughts on post-deployment data science from the NannyML team, feel free to visit our blog. You can also sing up for our newsletter, which brings together the best papers, articles, news, and open-source libraries highlighting the ML challenges after deployment.

📄 License

NannyML is distributed under an Apache License Version 2.0. A complete version can be found here. All contributions will be distributed under this license.

Comments

Can't load dataset
nannyml version: 0.4.0

Python version: 3.8.0

Operating System: 5.10.102.1-microsoft-standard-WSL2 ; Ubuntu 18.04.6 LTS

Description

I'm trying to walk through the Quickstart guide and getting the following error: module 'nannyml' has no attribute 'load_synthetic_binary_classification_dataset'

What I Did

import pandas as pd import nannyml as nml from IPython.display import display reference, analysis, analysis_target =nml.load_synthetic_binary_classification_dataset() display(analysis.head()) display(reference.head())
opened by nlp-sid 10
Suggested changes to the documentation
Some minor corrections / suggestions for the documentation:

(0) A general comment: as far as I can tell, nowhere in the documentation is it explicitly stated that the model not required to do the performance prediction (although it is implicit). Also, it's always assumed that the model is a machine learning one, but in theory the software is model-agnostic, as long as the model outputs conform to the expected formats, right?

(1) https://docs.nannyml.com/latest/quick.html

replace it's with its (2 occurrences)

"This is why on the synthetic dataset it is provided in a separate object." --> "This is why in the synthetic dataset it is provided in a separate object."

Some words / phrases are capitalized in the middle of a sentence, seemingly at random. i,e, "Model Monitoring" or "Machine Learning".

In the first plot, I feel it would be important to also show the actual model performance (ROC AUC in this case). This is probably THE most crucial thing a potential user wants to see here: are the predictions correct?

Final sentence: "This drift is responsible for the potential negative impact in performance that we observed." The actual ROC AUC is never actually shown, so the reader has no idea what the change in performance really is (just has to trust it is true).

(2) Glossary (https://docs.nannyml.com/latest/glossary.html#glossary)

In the entry for concept drift, it should probably be stated that the term is sometimes used (by others) with a broader definition that also includes things like label shift (?).

In the predicted scores entry, "calues" should be "values".

Univariate Drift Detection and Multivariate Drift Detection entries: "of our model" is superfluous and probably misleading in this context.

The entry for Model "Definition of a model." is a circular definition.

CBPE (Confidence-Based Perofmance Estimation): change "Perofmance" to "Performance"

(3) https://docs.nannyml.com/latest/guides/data_drift.html

"instannce" --> "instance"

"consice" --> "concise"

The plot "Distribution over time for y_pred_proba" has the x and y axis labels swapped.

"occurance" --> "occurrence"

The section "Drift detection for model targets" seems to just end without much of a conclusion.

(4) https://docs.nannyml.com/latest/guides/performance_estimation.html

The y-axis title on last plot is partially cropped; it would also be useful to change the x-axis to time, and add the analysis period line to make this plot easier to compare to the previous one that has the performance prediction.

documentation
opened by humphrey-and-the-machine 9
Add bootstrapping options to chunk methods

It would be nice if you'd allow bootstrapping (resampling with replacement) instead of non-overlapping chunks for the CBPE estimate. Ideally, something like this: https://stats.stackexchange.com/questions/96739/what-is-the-632-rule-in-bootstrapping
enhancement stale

opened by ai-noahdolev 8
Confidence bounds on CBPE plot go above 1.0 when ROC-AUC is 1.0
nannyml version: 0.2.0

Description The confidence bounds of the CBPE plot go above 1.0 when ROC-AUC is one. They should be cut off at 1.0 as a ROC-AUC of over 1.0 is impossible.

bug good first issue
opened by hakimelakhrass 8

Pandas data type 'string' not understood

Describe the bug Running the Quickstart results in an error

To Reproduce Steps to reproduce the behavior: Runing:

import pandas as pd
import nannyml as nml
from IPython.display import display

# Load synthetic data
reference, analysis, analysis_target = nml.load_synthetic_binary_classification_dataset()
display(reference.head())
display(analysis.head())

# Choose a chunker or set a chunk size
chunk_size = 5000

# initialize, specify required data columns, fit estimator and estimate
estimator = nml.CBPE(
   y_pred_proba='y_pred_proba',
   y_pred='y_pred',
   y_true='work_home_actual',
   timestamp_column_name='timestamp',
   metrics=['roc_auc'],
   chunk_size=chunk_size,
)
estimator = estimator.fit(reference)
estimated_performance = estimator.estimate(analysis)

# Show results
figure = estimated_performance.plot(kind='performance', metric='roc_auc', plot_reference=True)
figure.show()

# Define feature columns
feature_column_names = [
    col for col in reference.columns if col not in [
        'timestamp', 'y_pred_proba', 'period', 'y_pred', 'work_home_actual', 'identifier'
    ]]

# Let's initialize the object that will perform the Univariate Drift calculations
univariate_calculator = nml.UnivariateStatisticalDriftCalculator(
    feature_column_names=feature_column_names,
    timestamp_column_name='timestamp',
    chunk_size=chunk_size
)
univariate_calculator = univariate_calculator.fit(reference)
univariate_results = univariate_calculator.calculate(analysis)
# Plot drift results for all model inputs
for feature in univariate_calculator.feature_column_names:
    figure = univariate_results.plot(
        kind='feature_drift',
        metric='statistic',
        feature_column_name=feature,
        plot_reference=True
    )
    figure.show()

# Rank features based on number of alerts
ranker = nml.Ranker.by('alert_count')
ranked_features = ranker.rank(univariate_results, only_drifting = False)
display(ranked_features)

calc = nml.StatisticalOutputDriftCalculator(
    y_pred='y_pred',
    y_pred_proba='y_pred_proba',
    timestamp_column_name='timestamp'
)
calc.fit(reference)
results = calc.calculate(analysis)

figure = results.plot(kind='prediction_drift', plot_reference=True)
figure.show()

# Let's initialize the object that will perform Data Reconstruction with PCA
rcerror_calculator = nml.DataReconstructionDriftCalculator(feature_column_names=feature_column_names, timestamp_column_name='timestamp', chunk_size=chunk_size).fit(reference_data=reference)
# let's see Reconstruction error statistics for all available data
rcerror_results = rcerror_calculator.calculate(analysis)
figure = rcerror_results.plot(kind='drift', plot_reference=True)
figure.show()

Gives the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\anaconda3\lib\site-packages\nannyml\base.py in fit(self, reference_data, *args, **kwargs)
     94             self._logger.debug(f"fitting {str(self)}")
---> 95             return self._fit(reference_data, *args, **kwargs)
     96         except InvalidArgumentsException:

~\anaconda3\lib\site-packages\nannyml\drift\model_inputs\univariate\statistical\calculator.py in _fit(self, reference_data, *args, **kwargs)
    105         self.previous_reference_data = reference_data.copy()
--> 106         self.previous_reference_results = self._calculate(self.previous_reference_data).data
    107 

~\anaconda3\lib\site-packages\nannyml\drift\model_inputs\univariate\statistical\calculator.py in _calculate(self, data, *args, **kwargs)
    116 
--> 117         self.continuous_column_names, self.categorical_column_names = _split_features_by_type(
    118             data, self.feature_column_names

~\anaconda3\lib\site-packages\nannyml\base.py in _split_features_by_type(data, feature_column_names)
    229 
--> 230     categorical_column_names = [col for col in feature_column_names if _column_is_categorical(data[col])]
    231 

~\anaconda3\lib\site-packages\nannyml\base.py in <listcomp>(.0)
    229 
--> 230     categorical_column_names = [col for col in feature_column_names if _column_is_categorical(data[col])]
    231 

~\anaconda3\lib\site-packages\nannyml\base.py in _column_is_categorical(column)
    235 def _column_is_categorical(column: pd.Series) -> bool:
--> 236     return column.dtype in ['object', 'string', 'category', 'bool']
    237 

TypeError: data type 'string' not understood

During handling of the above exception, another exception occurred:

CalculatorException                       Traceback (most recent call last)
<ipython-input-1-9ae82d7fa4d4> in <module>
     39     chunk_size=chunk_size
     40 )
---> 41 univariate_calculator = univariate_calculator.fit(reference)
     42 univariate_results = univariate_calculator.calculate(analysis)
     43 # Plot drift results for all model inputs

~\anaconda3\lib\site-packages\nannyml\base.py in fit(self, reference_data, *args, **kwargs)
     99             raise
    100         except Exception as exc:
--> 101             raise CalculatorException(f"failed while fitting {str(self)}.\n{exc}")
    102 
    103     def calculate(self, data: pd.DataFrame, *args, **kwargs) -> AbstractCalculatorResult:

CalculatorException: failed while fitting <nannyml.drift.model_inputs.univariate.statistical.calculator.UnivariateStatisticalDriftCalculator object at 0x0000022BBF196A30>.
data type 'string' not understood

Expected behavior The quickstart code runs without a problem.

Additional context

The user who had that issue was running python 3.8 on windows through a pycharm environment.

I couldn't reproduce the error when I tried on my machine. Moreover when I guided the user to set up a new conda environment the error went away.

However maybe the way string type is defined here could be changed similar to suggestions such as these to cover more cases? I 'd hold of on that until we see more users having the issue, since in this case a misconfigured environment is more likely the problem than a library compatibility issue.

bug stale

opened by nikml 6

Example plots not showing

nannyml version: 0.4.1/ 0.4.0
Python version: python:3.8
Operating System: python:3.8-slim-bullseye Docker image (Debian based)

Description

I was trying to reproduce the example on the main readme. Plots are not showing. See below:

I tried this with both versions 0.4.0/0.4.1

If it helps, I'm using:

jupyterlab==3.4.2
pandas==1.4.1

What I Did

Example on readme, as is:

import pandas as pd
import nannyml as nml

# Load synthetic data
reference, analysis, analysis_target = nml.load_synthetic_binary_classification_dataset()
data = pd.concat([reference, analysis], ignore_index=True)

# Extract meta data
metadata = nml.extract_metadata(data = reference, model_name='wfh_predictor', model_type='classification_binary', exclude_columns=['identifier'])
metadata.target_column_name = 'work_home_actual'

# Choose a chunker or set a chunk size
chunk_size = 5000

# Estimate model performance
estimator = nml.CBPE(model_metadata=metadata, metrics=['roc_auc'], chunk_size=chunk_size)
estimator.fit(reference)
estimated_performance = estimator.estimate(data=data)

figure = estimated_performance.plot(metric='roc_auc', kind='performance')
figure.show()

# Detect multivariate feature drift
multivariate_calculator = nml.DataReconstructionDriftCalculator(model_metadata=metadata, chunk_size=chunk_size)
multivariate_calculator.fit(reference_data=reference)
multivariate_results = multivariate_calculator.calculate(data=data)

figure = multivariate_results.plot(kind='drift')
figure.show()

# Detect univariate feature drift
univariate_calculator = nml.UnivariateStatisticalDriftCalculator(model_metadata=metadata, chunk_size=chunk_size)
univariate_calculator.fit(reference_data=reference)
univariate_results = univariate_calculator.calculate(data=data)

# Rank features based on number of alerts
ranker = nml.Ranker.by('alert_count')
ranked_features = ranker.rank(univariate_results, model_metadata=metadata, only_drifting = False)

for feature in ranked_features.feature:
    figure = univariate_results.plot(kind='feature_distribution', feature_label=feature)
    figure.show()

Thank you!

opened by IgnacioPascale 6

extra dependencies not being installed

Hi!

When following the instructions and running poetry install -E test -E doc -E dev, extra packages such as tox are not installed. I think this is related to closed issue 67.

This is related to this Poetry behaviour. Extra packages should come from dependencies and not dev-dependencies, otherwise they are not installed.

I have tried to move the dev-dependencies packages within the dependencies section in pyproject.toml and after having updated the poetry.lock it works fine.

Do you want me to send a PR with this? I you think it would be better to somewhat separate "dev" dependencies from other dependencies, an option would be to use Poetry dependency groups with something such as a [tool.poetry.group.test.dependencies] section. I can do that in the PR if you think it is better.
question

opened by rfrenoy 6
Update univariate comparison
Changed confidence intervals to 95% instead of 99.99%

Changed figure sizes and font sizes to make the figures more readable

Added plots that zoom into the behaviour of methods on smaller shifts for the shifting mean and shifting SD experiments

Split apart uniform distribution in the categorical section
opened by cartgr 4
Running nml.CBPE getting typeerror

I got TypeError new() missing 1 required positional argument: 'model_metadata' when running this chunk of the codes:

estimator = nml.CBPE( y_pred_proba='y_pred_proba', y_pred='y_pred', y_true='y', timestamp_column_name='timestamp', metrics=['roc_auc', 'f1'], chunk_size=5000, problem_type='classification_binary', )

nannyML version is 0.4.1

What does the error describe here? Thanks for help

opened by langmusi 4

nannyml can confuse months with days on some rows of a dataset

Describe the bug NannyML gets the month and the date values wrong and can make a date of 01-09-2018 (1st September 2018) be thought of 09-01-2018 (9th January 2018).

To Reproduce Install NannyML and run the following code in a jupyter notebook:

import wget
from pathlib import Path
import pandas as pd
import nannyml as nml


url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00618/Steel_industry_data.csv'
download_foler = Path.home().joinpath("Downloads")
filename = wget.download(url,out = str(download_foler))

data = pd.read_csv(filename, header=0)
display(data.head())

features_selected = list(data.columns)[2:]

data['partition'] = 'train'
ind_ = data.shape[0]//3
data.loc[ind_:2*ind_, 'partition'] = 'reference'
data.loc[2*ind_: , 'partition'] = 'analysis'

reference = data.loc[data.partition == 'reference', :].reset_index(drop=True)
analysis = data.loc[data.partition == 'analysis', :].reset_index(drop=True)

calc = nml.UnivariateStatisticalDriftCalculator(
    feature_column_names=[features_selected[0]],
    timestamp_column_name='date'
)
calc.fit(reference)
results = calc.calculate(analysis)

drift_fig = results.plot(kind='feature_drift', feature_column_name=features_selected[0], plot_reference=True)
drift_fig.show()

display(results.data)
display(analysis)

A partial screenshot of the results.data can be seen below:

Screenshot 2022-08-30 at 15-28-31 NP18 UCI Ste… (2) - JupyterLab

A partial screensgot of the first and last line of the analysis dataframe can be seen below:

Screenshot 2022-08-30 at 15-27-04 NP18 UCI Ste… (2) - JupyterLab

By comparing the two screenshots you can see that some dates have been altered.

Expected behavior The dates would be recognized correctly. The misidentification of dates can be seen by comparing the start and end dates of the results data and the start and end dates of the analysis dataset.

Screenshots & scripts The problem is also visible on the following univariate drift plot:

newplot(24)

stale

opened by nikml 4

The example from Quick Start does not work
nannyml version: 0.3.2

Python version: 3.10.4

Operating System: macos 12.0.1

Hello

reference, analysis, analysis_target = nml.load_synthetic_binary_classification_dataset() does not work. error msg: AttributeError: module 'nannyml' has no attribute 'load_synthetic_binary_classification_dataset' Looks like instead the method load_synthetic_sample() must be used

metadata = nml.extract_metadata(data = reference, model_name='wfh_predictor', model_type='classification_binary', exclude_columns=['identifier']) also does not work. error msg: TypeError: extract_metadata() got an unexpected keyword argument 'model_type'
opened by cat-zeppelin 4
Improve readability of sequence checking conditionals
Issue here

Summary

Improve readability of sequence checking conditionals by making it simpler

Testing Approach

Since this is basically a refactor, I just ran poetry run tox to check if there we're any regressions.

Note

Trying to make contributing to open source stick. Feel free to disregard.
opened by jrggementiza 1
Improve readability of sequence checking conditionals
Motivation

Was poking around the codebase when I noticed sequence checking could use improvement in terms of code readability

Solution

We can use Truth Value Testing in checking the truthy-ness of sequences to make it simpler

Recommended Changes

We can simplify the following code fragments:

- if len(some_sequence) > 0: + if some_sequence: do_something()

- if len(some_other_sequence) == 0: + if not some_other_sequence: do_something_else()

Additional Context

Caveat; have excluded the ones in joy_plot.py as those are numpy sequences and is a different checking altogether

Note

Trying to make contributing to open source stick. Feel free to disregard.
enhancement
opened by jrggementiza 0
Use np.histogram_bin_edges to compute bin edges for ECE

What is this PR for ?

This PR is intended at improving the binning used in the calibration step by using np.histogram_bin_edges instead of the current _get_bin_index_edges function.

It also adds a new calibration_bin_count parameter to CBPE to be able to tune the number of bins to use in needs_calibration function.

Please note that the default values for bin_count are still 10. However it would probably better to switch to auto or fd as it would be generally better.

How was it tested?

New tests added in tests/test_calibration.py to validate the new behaviour. poetry run tox is passing.

opened by Jebq 2

NannyML (quicstart code) silently fails to create a univariate drift plot when too many features are selected

Describe the bug

When we have MANY features into univariate drift and use code trying to plot them all it fails silently. The machine still computes but does, presumably?, nothing.

To Reproduce

Posting example code since a full reproducible example relies on internal compute resources.


# load data and create reference and analysis dataframes
# in this cases fabert openml dataset

chunker = nml.SizeBasedChunker(chunk_size=_suggested_chunk_size)


univariate_calculator = nml.UnivariateDriftCalculator(
    column_names=feature_column_names,
    continuous_methods=['jensen_shannon'],
    categorical_methods=['jensen_shannon'],
    chunker=chunker,
)
univariate_calculator = univariate_calculator.fit(reference)
univariate_results = univariate_calculator.calculate(analysis)

In our case the selected features number is 800. Trying to create one plot with:

figure2 = univariate_results.filter(
    period='all',
    column_names=univariate_results.column_names, 
    methods=['jensen_shannon']).plot(kind='drift')

will fail and the Jupyterlab notebook kept running doing, presumably, nothing.

Expected behavior A plot would be created.

Additional context If instead we try to create plots one by one, things work fine:

for ftr in univariate_results.column_names:
    _fgr = univariate_results.filter(
        period='all',
        column_names=[ftr], 
        methods=['jensen_shannon']
    ).plot(kind='drift')
    _fgr.write_image(f"{_figure_folder}/drift-{ftr}.svg")

bug

opened by nikml 1

Inappropriate Error when a string is given for drift methods

# missing code that creates relevant dataframes
feature_column_names = train.drop(['fraud_reported', 'timestamp'],axis = 1).columns

univariate_calculator = nml.UnivariateDriftCalculator(
    column_names=list(feature_column_names),
    continuous_methods='jensen_shannon')

univariate_calculator = univariate_calculator.fit(test)
univariate_results = univariate_calculator.calculate(val)

---------------------------------------------------------------------------
InvalidArgumentsException                 Traceback (most recent call last)
Cell In[23], line 7
      1 feature_column_names = train.drop(['fraud_reported', 'timestamp'],axis = 1).columns
      3 univariate_calculator = nml.UnivariateDriftCalculator(
      4     column_names=list(feature_column_names),
      5     continuous_methods='kolmogorov_smirnov')
----> 7 univariate_calculator = univariate_calculator.fit(test)
      8 univariate_results = univariate_calculator.calculate(val)
     10 # for column_name in univariate_calculator.continuous_column_names:
     11 #     figure = univariate_results.plot(
     12 #         kind='drift',
   (...)
     16 #     )
     17 #     figure.show()

File ~/.conda/envs/nanny/lib/python3.10/site-packages/nannyml/base.py:138, in AbstractCalculator.fit(self, reference_data, *args, **kwargs)
    136 try:
    137     self._logger.debug(f"fitting {str(self)}")
--> 138     return self._fit(reference_data, *args, **kwargs)
    139 except InvalidArgumentsException:
    140     raise

File ~/.conda/envs/nanny/lib/python3.10/site-packages/nannyml/usage_logging.py:189, in log_usage.<locals>.logging_decorator.<locals>.logging_wrapper(*args, **kwargs)
    187 finally:
    188     if runtime_exception is not None:
--> 189         raise runtime_exception
    190     else:
    191         return res

File ~/.conda/envs/nanny/lib/python3.10/site-packages/nannyml/usage_logging.py:142, in log_usage.<locals>.logging_decorator.<locals>.logging_wrapper(*args, **kwargs)
    139 runtime_exception, res = None, None
    140 try:
    141     # run original function
--> 142     res = func(*args, **kwargs)
    143 except BaseException as exc:
    144     runtime_exception = exc

File ~/.conda/envs/nanny/lib/python3.10/site-packages/nannyml/drift/univariate/calculator.py:109, in UnivariateDriftCalculator._fit(self, reference_data, *args, **kwargs)
    104 self.continuous_column_names, self.categorical_column_names = _split_features_by_type(
    105     reference_data, self.column_names
    106 )
    108 for column_name in self.continuous_column_names:
--> 109     self._column_to_models_mapping[column_name] += [
    110         MethodFactory.create(key=method, feature_type=FeatureType.CONTINUOUS, chunker=self.chunker).fit(
    111             reference_data[column_name]
    112         )
    113         for method in self.continuous_method_names
    114     ]
    116 for column_name in self.categorical_column_names:
    117     self._column_to_models_mapping[column_name] += [
    118         MethodFactory.create(key=method, feature_type=FeatureType.CATEGORICAL, chunker=self.chunker).fit(
    119             reference_data[column_name]
    120         )
    121         for method in self.categorical_method_names
    122     ]

File ~/.conda/envs/nanny/lib/python3.10/site-packages/nannyml/drift/univariate/calculator.py:110, in <listcomp>(.0)
    104 self.continuous_column_names, self.categorical_column_names = _split_features_by_type(
    105     reference_data, self.column_names
    106 )
    108 for column_name in self.continuous_column_names:
    109     self._column_to_models_mapping[column_name] += [
--> 110         MethodFactory.create(key=method, feature_type=FeatureType.CONTINUOUS, chunker=self.chunker).fit(
    111             reference_data[column_name]
    112         )
    113         for method in self.continuous_method_names
    114     ]
    116 for column_name in self.categorical_column_names:
    117     self._column_to_models_mapping[column_name] += [
    118         MethodFactory.create(key=method, feature_type=FeatureType.CATEGORICAL, chunker=self.chunker).fit(
    119             reference_data[column_name]
    120         )
    121         for method in self.categorical_method_names
    122     ]

File ~/.conda/envs/nanny/lib/python3.10/site-packages/nannyml/drift/univariate/methods.py:152, in MethodFactory.create(cls, key, feature_type, **kwargs)
    149     raise InvalidArgumentsException(f"cannot create method given a '{type(key)}'. Please provide a string.")
    151 if key not in cls.registry:
--> 152     raise InvalidArgumentsException(
    153         f"unknown method key '{key}' given. "
    154         "Should be one of ['kolmogorov_smirnov', 'jensen_shannon', 'wasserstein', 'chi2', "
    155         "'jensen_shannon', 'l_infinity', 'hellinger']."
    156     )
    158 if feature_type not in cls.registry[key]:
    159     raise InvalidArgumentsException(f"method {key} does not support {feature_type.value} features.")

InvalidArgumentsException: unknown method key 'k' given. Should be one of ['kolmogorov_smirnov', 'jensen_shannon', 'wasserstein', 'chi2', 'jensen_shannon', 'l_infinity', 'hellinger'].

The problem seems to be that the user incorrectly gave a string instead of a list of strings. However instead of getting a proper error that he should give a list, he gets an error of unknown method k.

I m assuming (because I didn't dig into the code) the string is turned into a list and then the first letter is attempted to be matched against our string options.

P.S. Alternatively we can also accept a string here and handle it appropriately. Nice to have but less important than an error that correctly points out what went wrong.

bug

opened by nikml 1

Releases(v0.8.1)

v0.8.1(Dec 1, 2022)
Changed

Thorough refactor of the nannyml.drift.ranker module. The abstract base class and factory have been dropped in favor of a more flexible approach.

Thorough refactor of our Plotly-based plotting modules. These have been rewritten from scratch to make them more modular and composable. This will allow us to deliver more powerful and meaningful visualizations faster.

Added

Added a new univariate drift method. The Hellinger distance, used for continuous variables.

Added an extensive write-up on when to use which univariate drift method.

Added a new way to rank the results of univariate drift calculation. The CorrelationRanker ranks columns based on the correlation between the drift value and the change in realized or estimated performance. Read all about it in the ranking documentation

Fixed

Disabled usage logging for or GitHub workflows

Allow passing a single string to the metrics parameter of the result.filter() function, as per special request.

Source code(tar.gz)
Source code(zip)
nannyml-0.8.1-py3-none-any.whl(14.77 MB)
v0.8.0(Nov 24, 2022)
Changed

Updated mypy to a new version, immediately resulting in some new checks that failed.

Added

Added new univariate drift methods. The Wasserstein distance for continuous variables, and the L-Infinity distance for categorical variables.

Added usage logging to our key functions. Check out the docs to find out more on what, why, how, and how to disable it if you want to.

Fixed

Fixed and updated various parts of the docs, reported at warp speed! Thanks @NeoKish!

Fixed mypy issues concerning 'implicit optionals'.

Source code(tar.gz)
Source code(zip)
nannyml-0.8.0-py3-none-any.whl(14.77 MB)
v0.7.0(Nov 7, 2022)
Changed

Updated the handling of "leftover" observations when using the SizeBasedChunker and CountBasedChunker. Renamed the parameter for tweaking that behavior to incomplete, that can be set to keep, drop or append. Default behavior for both is now to append leftover observations to the last full chunk.

Refactored the nannyml.drift module. The intermediate structural level (model_inputs, model_outputs, targets) has been removed and turned into a single unified UnivariateDriftCalculator. The old built-in statistics have been re-implemented as Methods, allowing us to add new methods to detect univariate drift.

Simplified a lot of the codebase (but also complicated some bits) by storing results internally as multilevel-indexed DataFrames. This means we no longer have to 'convey information' by encoding data column names and method names in the names of result columns. We've introduced a new paradigm to deal with results. Drill down to the data you really need by using the filter method, which returns a new Result instance, with a smaller 'scope'. Then turn this Result into a DataFrame using the to_df method.

Changed the structure of the pyproject.toml file due to a Poetry upgrade to version 1.2.1.

Added

Expanded the nannyml.io module with new Writer implementations: DatabaseWriter that exports data into multiple tables in a relational database and the PickleFileWriter which stores the pickled Results on local/remote/cloud disk.

Added a new univariate drift detection method based on the Jensen-Shannon distance. Used within the UnivariateDriftCalculator.

Fixed

Added lightgbm installation instructions to our installation guide.

Source code(tar.gz)
Source code(zip)
nannyml-0.7.0-py3-none-any.whl(14.77 MB)
v0.6.3(Sep 22, 2022)
Changed

dependencybot dependency updates

stalebot setup

Fixed

CBPE now uses uncalibrated y_pred_proba values to calculate realized performance. Fixed for both binary and multiclass use cases (#98)

Fix an issue where reference data was rendered incorrectly on joy plots

Updated the 'California Housing' example docs, thanks for the help @NeoKish

Fix lower confidence bounds and thresholds under zero for regression cases. When the lower limit is set to 0, the lower threshold will not be plotted. (#127)

Source code(tar.gz)
Source code(zip)
nannyml-0.6.3-py3-none-any.whl(14.76 MB)
v0.6.2(Sep 16, 2022)
Changed

Made the timestamp_column_name required by all calculators and estimators optional. The main consequences of this are plots have a chunk-index based x-axis now when no timestamp column name was given. You can also not chunk by period when the timestamp column name is not specified.

Fixed

Added missing s3fs dependency

Fixed outdated plotting kind constants in the runner (used by CLI)

Fixed some missing images and incorrect version numbers in the README, thanks @NeoKish!

Added

Added a lot of additional tests, mainly concerning plotting and the Runner class

Source code(tar.gz)
Source code(zip)
nannyml-0.6.2-py3-none-any.whl(14.76 MB)
v0.6.1(Sep 9, 2022)
Changed

Use the problem_type parameter to determine the correct graph to output when plotting model output drift

Fixed

Showing the wrong plot title for DLE estimation result plots, thanks @NeoKish

Fixed incorrect plot kinds in some error feedback for the model output drift calculator

Fixed missing problem_type argument in the Quickstart guide

Fix incorrect visualization of confidence bands on reference data in DLE and CBPE result plots

Source code(tar.gz)
Source code(zip)
nannyml-0.6.1-py3-none-any.whl(14.76 MB)
v0.6.0(Sep 8, 2022)
Added

Added support for regression problems across all calculators and estimators. In some cases a required problem_type parameter is required during calculator/estimator initialization, this is a breaking change. Read more about using regression in our tutorials and about our new performance estimation for regression using the Direct Loss Estimation (DLE) algorithm.

Changed

Improved tox running speed by skipping some unnecessary package installations. Thanks @baskervilski!

Fixed

Fixed an issue where some Pandas column datatypes were not recognized as continuous by NannyML, causing them to be dropped in calculations. Thanks for reporting @Dbhasin1!

Fixed an issue where some helper columns for visualization crept into the stored reference results. Good catch @Dbhasin1!

Fixed an issue where a Reader instance would raise a WriteException. Thanks for those eagle eyes @baskervilski!

Source code(tar.gz)
Source code(zip)
nannyml-0.6.0-py3-none-any.whl(14.76 MB)
v0.5.3(Aug 30, 2022)
Changed

We've completely overhauled the way we determine the "stability" of our estimations. We've moved on from determining a minimum Chunk size to estimating the sampling error for an operation on a Chunk.

A sampling error value will be provided per metric per Chunk in the result data for reconstruction error multivariate drift calculator, all performance calculation metrics and all performance estimation metrics.

Confidence bounds are now also based on this sampling error and will display a range around an estimation +/- 3 times the sampling error in CBPE and reconstruction error multivariate drift calculator. Be sure to check out our in-depth documentation on how it works or dive right into the implementation.

Fixed

Fixed issue where an outdated version of Numpy caused Pandas to fail reading string columns in some scenarios (#93). Thank you, @Bernhard and @Gabriel for the investigative work!

Source code(tar.gz)
Source code(zip)
nannyml-0.5.3-py3-none-any.whl(12.42 MB)
v0.5.2(Aug 17, 2022)
Changed

Swapped out ASCII art library from 'art' to 'PyFiglet' because the former was not yet present in conda-forge.

Fixed

Some leftover parameter was forgotten during cleanup, breaking CLI functionality

CLI progressbar was broken due to a boolean check with task ID 0.

Source code(tar.gz)
Source code(zip)
nannyml-0.5.2-py3-none-any.whl(12.42 MB)
v0.5.1(Aug 16, 2022)
Added

Added simple CLI implementation to support automation and MLOps toolchain use cases. Supports reading/writing to cloud storage using S3, GCS, ADL, ABFS and AZ protocols. Containerized version available at dockerhub.

Changed

make clean now also clears __pycache__

Fixed some inconsistencies in docstrings (they still need some additional love though)

Source code(tar.gz)
Source code(zip)
nannyml-0.5.1-py3-none-any.whl(12.42 MB)
v0.5.0(Jul 7, 2022)
Changed

Replaced the whole Metadata system by a more intuitive approach.

Fixed

Fix docs (#87) and (#89), thanks @NeoKish

Fix confidence bounds for binary settings (#86), thanks @rfrenoy

Fix README (#87), thanks @NeoKish

Fix index misalignment on calibration (#79)

Fix Poetry dev-dependencies issues (#78), thanks @rfrenoy

Fix incorrect documentation links (#76), thanks @SoyGema

Source code(tar.gz)
Source code(zip)
nannyml-0.5.0-py3-none-any.whl(12.41 MB)
v0.4.1(May 19, 2022)
Added

Added limited support for regression use cases: create or extract RegressionMetadata and use it for drift detection. Performance estimation and calculation require more research.

Changed

DefaultChunker splits into 10 chunks of equal size.

SizeBasedChunker no longer drops incomplete last chunk by default, but this is now configurable behavior.

Source code(tar.gz)
Source code(zip)
nannyml-0.4.1-py3-none-any.whl(9.10 MB)
v0.4.0(May 13, 2022)
Added

Added support for new metrics in the Confidence Based Performance Estimator (CBPE). It now estimates roc_auc, f1, precision, recall, specificity and accuracy.

Added support for multiclass classification. This includes

Specifying multiclass classification metadata + support in automated metadata extraction (by introducing a model_type parameter).

Support for all CBPE metrics.

Support for realized performance calculation using the PerformanceCalculator.

Support for all types of drift detection (model inputs, model output, target distribution).

A new synthetic toy dataset.

Changed

Removed the identifier property from the ModelMetadata class. Joining analysis data and analysis target values should be done upfront or index-based.

Added an exclude_columns parameter to the extract_metadata function. Use it to specify the columns that should not be considered as model metadata or features.

All fit methods now return the fitted object. This allows chaining Calculator/Estimator instantiation and fitting into a single line.

Custom metrics are no longer supported in the PerformanceCalculator. Only the predefined metrics remain supported.

Big documentation revamp: we've tweaked overall structure, page structure and incorporated lots of feedback.

Improvements to consistency and readability for the 'hover' visualization in the step plots, including consistent color usage, conditional formatting, icon usage etc.

Improved indication of "realized" and "estimated" performance in all CBPE step plots (changes to hover, axes and legends)

Fixed

Updated homepage in project metadata

Added missing metadata modification to the quickstart

Perform some additional check on reference data during preprocessing

Various documentation suggestions (#58)

Source code(tar.gz)
Source code(zip)
nannyml-0.4.0-py3-none-any.whl(9.10 MB)
v0.3.2(May 3, 2022)
Fixed

Deal with out-of-time-order data when chunking (thanks for the assist @SoyGema!)

Fix reversed Y-axis and plot labels in continuous distribution plots

Source code(tar.gz)
Source code(zip)
nannyml-0.3.1-py3-none-any.whl(5.48 MB)
nannyml-0.3.2-py3-none-any.whl(5.48 MB)
v0.3.1(Apr 11, 2022)

Source code(tar.gz)
Source code(zip)
nannyml-0.3.1-py3-none-any.whl(5.48 MB)
v0.3.0(Apr 8, 2022)
Added

Added support for both predicted labels and predicted probabilities in ModelMetadata.

Support for monitoring model performance metrics using the PerformanceCalculator.

Support for monitoring target distribution using the TargetDistributionCalculator

Changed

Plotting will default to using step plots.

Restructured the nannyml.drift package and subpackages. Breaking changes!

Metadata completeness check will now fail when there are features of FeatureType.UNKNOWN.

Chunk date boundaries are now calculated differently for a PeriodBasedChunker, using the theoretical period for boundaries as opposed to the observed boundaries within the chunk observations.

Updated version of the black pre-commit hook due to breaking changes in its click dependency.

The minimum chunk size will now be provided by each individual calculator / estimator / metric, allowing for each of them to warn the end user when chunk sizes are suboptimal.

Fixed

Restrict version of the scipy dependency to be >=1.7.3, <1.8.0. Planned to be relaxed ASAP.

Deal with missing values in chunks causing NaN values when concatenating.

Crash when estimating CBPE without a target column present

Incorrect label in ModelMetadata printout

Source code(tar.gz)
Source code(zip)
nannyml-0.3.0-py3-none-any.whl(5.48 MB)

Detecting silent model failure. NannyML estimates performance with an algorithm called Confidence-based Performance estimation (CBPE), developed by core contributors. It is the only open-source algorithm capable of fully capturing the impact of data drift on performance.

Related tags

Overview

💡 What is NannyML?

☔ Why use NannyML?

🧠 GO DEEP

🔱 Features

1. Performance estimation and monitoring

2. Data drift detection

3. Intelligent alerting

🚀 Getting started

Install NannyML

Quick Start

📖 Documentation

🦸 Contributing and Community

🙋 Get help

🥷 Stay updated

📄 License

Comments

Description

What I Did

Description

What I Did

Summary

Testing Approach

Note

Motivation

Solution

Recommended Changes

Additional Context

Note

What is this PR for ?

How was it tested?

Releases(v0.8.1)

v0.8.1(Dec 1, 2022)

Changed

Added

Fixed

v0.8.0(Nov 24, 2022)

Changed

Added

Fixed

v0.7.0(Nov 7, 2022)

Changed

Added

Fixed

v0.6.3(Sep 22, 2022)

Changed

Fixed

v0.6.2(Sep 16, 2022)

Changed

Fixed

Added

v0.6.1(Sep 9, 2022)

Changed

Fixed

v0.6.0(Sep 8, 2022)

Added

Changed

Fixed

v0.5.3(Aug 30, 2022)

Changed

Fixed

v0.5.2(Aug 17, 2022)

Changed

Fixed

v0.5.1(Aug 16, 2022)

Added

Changed

v0.5.0(Jul 7, 2022)

Changed

Fixed

v0.4.1(May 19, 2022)

Added

Changed

v0.4.0(May 13, 2022)

Added

Changed

Fixed

v0.3.2(May 3, 2022)