Evidently helps analyze machine learning models during validation or production monitoring

Overview

Evidently

Dashboard example

Interactive reports and JSON profiles to analyze, monitor and debug machine learning models.

Docs | Join Discord | Newsletter | Blog | Twitter

What is it?

Evidently helps analyze machine learning models during validation or production monitoring. The tool generates interactive visual reports and JSON profiles from pandas DataFrame or csv files. Currently 6 reports are available.

1. Data Drift

Detects changes in feature distribution. Dashboard example

2. Numerical Target Drift

Detects changes in numerical target and feature behavior. Dashboard example

3. Categorical Target Drift

Detects changes in categorical target and feature behavior. Dashboard example

4. Regression Model Performance

Analyzes the performance of a regression model and model errors. Dashboard example

5. Classification Model Performance

Analyzes the performance and errors of a classification model. Works both for binary and multi-class models. Dashboard example

6. Probabilistic Classification Model Performance

Analyzes the performance of a probabilistic classification model, quality of model calibration, and model errors. Works both for binary and multi-class models. Dashboard example

Installing from PyPI

MAC OS and Linux

Evidently is available as a PyPI package. To install it using pip package manager, run:

$ pip install evidently

The tool allows building interactive reports both inside a Jupyter notebook and as a separate HTML file. If you only want to generate interactive reports as HTML files or export as JSON profiles, the installation is now complete.

To enable building interactive reports inside a Jupyter notebook, we use jupyter nbextension. If you want to create reports inside a Jupyter notebook, then after installing evidently you should run the two following commands in the terminal from evidently directory.

To install jupyter nbextention, run:

$ jupyter nbextension install --sys-prefix --symlink --overwrite --py evidently

To enable it, run:

jupyter nbextension enable evidently --py --sys-prefix

That's it!

Note: a single run after the installation is enough. No need to repeat the last two commands every time.

Note 2: if you use Jupyter Lab, you may experience difficulties with exploring report inside a Jupyter notebook. However, the report generation in a separate .html file will work correctly.

Windows

Evidently is available as a PyPI package. To install it using pip package manager, run:

$ pip install evidently

The tool allows building interactive reports both inside a Jupyter notebook and as a separate HTML file. Unfortunately, building reports inside a Jupyter notebook is not yet possible for Windows. The reason is Windows requires administrator privileges to create symlink. In later versions we will address this issue.

Getting started

Jupyter Notebook

To start, prepare your data as two pandas DataFrames. The first should include your reference data, the second - current production data. The structure of both datasets should be identical.

  • For Data Drift report, include the input features only.
  • For Target Drift reports, include the column with Target and/or Prediction.
  • For Model Performance reports, include the columns with Target and Prediction.

Calculation results can be available in one of the two formats:

  • Option 1: an interactive Dashboard displayed inside the Jupyter notebook or exportable as a HTML report.
  • Option 2: a JSON Profile that includes the values of metrics and the results of statistical tests.

Option 1: Dashboard

After installing the tool, import Evidently dashboard and required tabs:

import pandas as pd
from sklearn import datasets

from evidently.dashboard import Dashboard
from evidently.tabs import DataDriftTab

iris = datasets.load_iris()
iris_frame = pd.DataFrame(iris.data, columns = iris.feature_names)

To generate the Data Drift report, run:

iris_data_drift_report = Dashboard(tabs=[DataDriftTab])
iris_data_drift_report.calculate(iris_frame[:100], iris_frame[100:], column_mapping = None)
iris_data_drift_report.save("reports/my_report.html")

To generate the Data Drift and the Categorical Target Drift reports, run:

iris_data_and_target_drift_report = Dashboard(tabs=[DataDriftTab, CatTargetDriftTab])
iris_data_and_target_drift_report.calculate(iris_frame[:100], iris_frame[100:], column_mapping = None)
iris_data_and_target_drift_report.save("reports/my_report_with_2_tabs.html")

If you get a security alert, press "trust html". HTML report does not open automatically. To explore it, you should open it from the destination folder.

To generate the Regression Model Performance report, run:

regression_model_performance = Dashboard(tabs=[RegressionPerfomanceTab]) 
regression_model_performance.calculate(reference_data, current_data, column_mapping = column_mapping) 

You can also generate a Regression Model Performance for a single DataFrame. In this case, run:

regression_single_model_performance = Dashboard(tabs=[RegressionPerformanceTab])
regression_single_model_performance.calculate(reference_data, None, column_mapping=column_mapping)

To generate the Classification Model Performance report, run:

classification_performance_report = Dashboard(tabs=[ClassificationPerformanceTab])
classification_performance_report.calculate(reference_data, current_data, column_mapping = column_mapping)

For Probabilistic Classification Model Performance report, run:

classification_performance_report = Dashboard(tabs=[ProbClassificationPerformanceTab])
classification_performance_report.calculate(reference_data, current_data, column_mapping = column_mapping)

You can also generate either of the Classification reports for a single DataFrame. In this case, run:

classification_single_model_performance = Dashboard(tabs=[ClassificationPerformanceTab])
classification_single_model_performance.calculate(reference_data, None, column_mapping=column_mapping)

or

prob_classification_single_model_performance = Dashboard(tabs=[ProbClassificationPerformanceTab])
prob_classification_single_model_performance.calculate(reference_data, None, column_mapping=column_mapping)

Option 2: Profile

After installing the tool, import Evidently profile and required sections:

import pandas as pd
from sklearn import datasets

from evidently.model_profile import Profile
from evidently.profile_sections import DataDriftProfileSection

iris = datasets.load_iris()
iris_frame = pd.DataFrame(iris.data, columns = iris.feature_names)

To generate the Data Drift profile, run:

iris_data_drift_profile = Profile(sections=[DataDriftProfileSection])
iris_data_drift_profile.calculate(iris_frame, iris_frame, column_mapping = None)
iris_data_drift_profile.json() 

To generate the Data Drift and the Categorical Target Drift profile, run:

iris_target_and_data_drift_profile = Profile(sections=[DataDriftProfileSection, CatTargetDriftProfileSection])
iris_target_and_data_drift_profile.calculate(iris_frame[:75], iris_frame[75:], column_mapping = None) 
iris_target_and_data_drift_profile.json() 

You can also generate a Regression Model Performance for a single DataFrame. In this case, run:

regression_single_model_performance = Profile(sections=[RegressionPerformanceProfileSection])
regression_single_model_performance.calculate(reference_data, None, column_mapping=column_mapping)

To generate the Classification Model Performance profile, run:

classification_performance_profile = Profile(sections=[ClassificationPerformanceProfileSection])
classification_performance_profile.calculate(reference_data, current_data, column_mapping = column_mapping)

For Probabilistic Classification Model Performance profile, run:

classification_performance_report = Profile(sections=[ProbClassificationPerformanceProfileSection])
classification_performance_report.calculate(reference_data, current_data, column_mapping = column_mapping)

You can also generate either of the Classification profiles for a single DataFrame. In this case, run:

classification_single_model_performance = Profile(sections=[ClassificationPerformanceProfileSection])
classification_single_model_performance.calculate(reference_data, None, column_mapping=column_mapping)

or

prob_classification_single_model_performance = Profile(sections=[ProbClassificationPerformanceProfileSection])
prob_classification_single_model_performance.calculate(reference_data, None, column_mapping=column_mapping)

Terminal

You can generate HTML reports or JSON profiles directly from the bash shell. To do this, prepare your data as two csv files. In case you run one of the performance reports, you can have only one file. The first one should include your reference data, the second - current production data. The structure of both datasets should be identical.

To generate a HTML report, run the following command in bash:

python -m evidently calculate dashboard --config config.json 
--reference reference.csv --current current.csv --output output_folder --report_name output_file_name

To generate a JSON profile, run the following command in bash:

python -m evidently calculate profile --config config.json 
--reference reference.csv --current current.csv --output output_folder --report_name output_file_name

Here:

  • reference is the path to the reference data,
  • current is the path to the current data,
  • output is the path to the output folder,
  • report_name is name of the output file,
  • config is the path to the configuration file,
  • pretty_print to print the JSON profile with indents (for profile only).

Currently, you can choose the following Tabs or Sections:

  • data_drift to estimate the data drift,
  • num_target_drift to estimate target drift for numerical target,
  • cat_target_drift to estimate target drift for categorical target,
  • classification_performance to explore the performance of a classification model,
  • prob_classification_performance to explore the performance of a probabilistic classification model,
  • regression_performance to explore the performance of a regression model.

To configure a report or a profile you need to create the config.json file. This file configures the way of reading your input data and the type of the report.

Here is an example of a simple configuration for a report, where we have comma separated csv files with headers and there is no date column in the data.

Dashboard:

{
  "data_format": {
    "separator": ",",
    "header": true,
    "date_column": null
  },
  "column_mapping" : {},
  "dashboard_tabs": ["cat_target_drift"]
}

Profile:

{
  "data_format": {
    "separator": ",",
    "header": true,
    "date_column": null
  },
  "column_mapping" : {},
  "profile_sections": ["data_drift"],
  "pretty_print": true
}

Here is an example of a more complicated configuration, where we have comma separated csv files with headers and datetime column. We also specified the column_mapping dictionary to add information about datetime, target and numerical_features.

Dashboard:

{
  "data_format": {
    "separator": ",",
    "header": true,
    "date_column": "datetime"
  },
  "column_mapping" : {
    "datetime":"datetime",
    "target":"target",
    "numerical_features": ["mean radius", "mean texture", "mean perimeter", 
      "mean area", "mean smoothness", "mean compactness", "mean concavity", 
      "mean concave points", "mean symmetry"]},
  "dashboard_tabs": ["cat_target_drift"],
  "sampling": {
      "reference": {
      "type": "none"
    },
      "current": {
      "type": "nth",
      "n": 2
    }
  }
}

Profile:

{
  "data_format": {
    "separator": ",",
    "header": true,
    "date_column": null
  },
  "column_mapping" : {
    "target":"target",
    "numerical_features": ["mean radius", "mean texture", "mean perimeter", 
      "mean area", "mean smoothness", "mean compactness", "mean concavity", 
      "mean concave points", "mean symmetry"]},
  "profile_sections": ["data_drift", "cat_target_drift"],
  "pretty_print": true,
  "sampling": {
    "reference": {
      "type": "none"
    },
    "current": {
      "type": "random",
      "ratio": 0.8
    }
  }
}

Telemetry

When you use Evidently in the command-line interface, we collect basic telemetry (starting from 0.1.21.dev0 version). It includes data on the environment (e.g. Python version) and usage (type of report or profile generated). You can read more about what we collect here.

You can opt-out from telemetry collection by setting the environment variable EVIDENTLY_DISABLE_TELEMETRY=1

Large datasets

As you can see from the above example, you can specify sampling parameters for large files. You can use different sampling strategies for reference and current data, or apply sampling only to one of the files. Currently we have 3 sampling types available:

  • none - there will be no sampling for the file,
  • nth - each Nth row of the file will be taken. This option works together with n parameter (see the example with the Dashboard above)
  • random - random sampling will be applied. This option works together with ratio parameter (see the example with the Profile above)

Documentation

For more information, refer to a complete Documentation.

Examples

  • See Data Drift Dashboard and Profile generation to explore the results both inside a Jupyter notebook and as a separate .html file: Iris, Boston

  • See Numerical Target and Data Drift Dashboard and Profile generation to explore the results both inside a Jupyter notebook and as a separate file: Iris, Breast Cancer

  • See Categorical Target and Data Drift Dashboard and Profile generation to explore the results both inside a Jupyter notebook and as a separate file: Boston

  • See Regression Performance Dashboard and Profile generation to explore the results both inside a Jupyter notebook and as a separate file: Bike Sharing Demand

  • See Classification Performance Dashboard and Profile generation to explore the results both inside a Jupyter notebook and as a separate file: Iris

  • See Probabilistic Classification Performance Dashboard and Profile generation to explore the results both inside a Jupyter notebook and as a separate .html file: Iris, Breast Cancer

Stay updated

We will be releasing more reports soon. If you want to receive updates, follow us on Twitter, or sign up for our newsletter. You can also find more tutorials and explanations in our Blog. If you want to chat and connect, join our Discord community!

Issues
  • Integrating in Prediction pipeline

    Integrating in Prediction pipeline

    This package really helps the Data Science Team to find out the data drift . Please let me know is there any way we can integrate into prediction pipeline to generate the metrics. Is there any way we can generate the json file with this metrics ?

    enhancement 
    opened by rambabusure 9
  • Tests for `CatTargetDriftAnalyzer`, `chi_stat_test` and code simplifications

    Tests for `CatTargetDriftAnalyzer`, `chi_stat_test` and code simplifications

    • [x] Add tests for CatTargetDriftAnalyzer
    • [x] Refactor
    • [x] Ask a bunch of questions :-)
      • [x] What about target_type ?
      • [x] Tests with options

    closes #96

    opened by burkovae 8
  • data missing in the report

    data missing in the report

    Hi

    I was trying to generate a classification dashboard. The "reference" dataframe has 128770 rows, however, the dashboard only shows that it has 122 rows (number of objects). And for "current" dataframe, it has 632337 rows, but the dashboard only shows it has 1079 rows. I am not sure what went wrong.

    Yang

    opened by superyang713 7
  • AttributeError when trying to save report

    AttributeError when trying to save report

    When saving a report using

    data_drift_report.save("/dbfs/FileStore/drift/drift_report.html")

    I get

    AttributeError: 'DataDriftTableWidget' object has no attribute 'wi'

    caused by

    /databricks/python/lib/python3.8/site-packages/evidently/widgets/data_drift_table_widget.py in get_info(self) 25 26 def get_info(self) -> BaseWidgetInfo: ---> 27 if self.wi: 28 return self.wi 29 raise ValueError("no widget info provided")

    I'm using azure databricks, and can save the json profile, and get the same error if I use data_drift_report.show(), though I never installed nbextension.

    I can save the iris report, and am using numerical and categorical feature mappings.

    opened by lrumanes 7
  • ProbClassificationPerformanceTab - Report not getting displayed

    ProbClassificationPerformanceTab - Report not getting displayed

    Hey,

    I was following the documentation to try out Evidently. I save the reports as html and open them up to analyze. I recently tried out the "ProbClassificationPerformanceTab" as it proved to be useful for my problem. The code runs fine but when I open the output HTML file, I am just able to see "Loading..." and not any Visualizations. But I have used the drift report and cat target drift report and both of them worked fine.

    Can you please help with this?

    bug 
    opened by kn1507 7
  • Incorrect AUC ROC score and log loss.

    Incorrect AUC ROC score and log loss.

    The probability classification report gives wrong values for the above mentioned metrics. Also when I looked into this code - prob_class_ref_quality_metrics_widget.py. the lines where auc and log_loss were calculated are mentioned in comments as problem

    bug 
    opened by vikrant-sahu 6
  • DQ Dashboard: Plotly ValueError thrown when `len(columns) > 15:`

    DQ Dashboard: Plotly ValueError thrown when `len(columns) > 15:`

    I'm running a DQ Dashboard as follows:

    dq_dashboard = Dashboard(tabs=[DataQualityTab()])
    dq_dashboard.calculate(ref_data_sample, prod_data_sample, column_mapping=column_mapping)
    

    When the number of columns in my dataframe is greater than 15 I get the following Plotly error:

    [ValueError: 
        Invalid value of type 'builtins.str' received for the 'text' property of heatmap
            Received value: ''
    
        The 'text' property is an array that may be specified as a tuple,
        list, numpy array, or pandas Series]()
    

    Plotly expects an array for text here but a str is passed.

    I suspect it's due to this line in data_quality_correlations.py:

    https://github.com/evidentlyai/evidently/blob/ed2558c1becfce6bed0379211ea15aaa3aaa666a/src/evidently/dashboard/widgets/data_quality_correlations.py#L121

    versions: evidently = "0.1.47.dev1" plotly = "5.6.0"

    stack trace:

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    /home/.../09-dq-checks.ipynb Cell 15' in <cell line: 2>()
          1 dq_dashboard = Dashboard(tabs=[DataQualityTab()])
    ----> 2 dq_dashboard.calculate(ref_data_sample, prod_data_sample, column_mapping=column_mapping)
          3 dq_dashboard.save("dq_dashboard_trip_score.html")
    
    File ~/.cache/pypoetry/virtualenvs/ds-model-monitoring-GQktnPqD-py3.9/lib/python3.9/site-packages/evidently/dashboard/dashboard.py:152, in Dashboard.calculate(self, reference_data, current_data, column_mapping)
        147 def calculate(self,
        148               reference_data: pandas.DataFrame,
        149               current_data: Optional[pandas.DataFrame] = None,
        150               column_mapping: Optional[ColumnMapping] = None):
        151     column_mapping = column_mapping or ColumnMapping()
    --> 152     self.execute(reference_data, current_data, column_mapping)
    
    File ~/.cache/pypoetry/virtualenvs/ds-model-monitoring-GQktnPqD-py3.9/lib/python3.9/site-packages/evidently/pipeline/pipeline.py:49, in Pipeline.execute(self, reference_data, current_data, column_mapping)
         47 for stage in self.stages:
         48     stage.options_provider = self.options_provider
    ---> 49     stage.calculate(
         50         rdata.copy(), None if cdata is None else cdata.copy(), column_mapping, self.analyzers_results
         51     )
    
    File ~/.cache/pypoetry/virtualenvs/ds-model-monitoring-GQktnPqD-py3.9/lib/python3.9/site-packages/evidently/dashboard/tabs/base_tab.py:63, in Tab.calculate(self, reference_data, current_data, column_mapping, analyzers_results)
         61 for widget in self._widgets:
         62     widget.options_provider = self.options_provider
    ---> 63     self._widget_results.append(widget.calculate(reference_data,
         64                                                  current_data,
         65                                                  column_mapping,
         66                                                  analyzers_results))
    
    File ~/.cache/pypoetry/virtualenvs/ds-model-monitoring-GQktnPqD-py3.9/lib/python3.9/site-packages/evidently/dashboard/widgets/data_quality_correlations.py:43, in DataQualityCorrelationsWidget.calculate(self, reference_data, current_data, column_mapping, analyzers_results)
         41 for kind in ['pearson', 'spearman', 'kendall', 'cramer_v']:
         42     if reference_correlations[kind].shape[0] > 1:
    ---> 43         correlation_figure = self._plot_correlation_figure(kind, reference_correlations, current_correlations)
         44         additional_graphs.append(
         45             AdditionalGraphInfo(
         46                 kind,
       (...)
         51             )
         52         )
         54         parts.append(
         55             {
         56                 "title": kind,
         57                 "id": kind
         58             }
         59         )
    
    File ~/.cache/pypoetry/virtualenvs/ds-model-monitoring-GQktnPqD-py3.9/lib/python3.9/site-packages/evidently/dashboard/widgets/data_quality_correlations.py:108, in DataQualityCorrelationsWidget._plot_correlation_figure(self, kind, reference_correlations, current_correlations)
        106     text = ""
        107     texttemplate = ""
    --> 108 trace = go.Heatmap(
        109     z=reference_correlations[kind],
        110     x=columns,
        111     y=columns,
        112     text=text,
        113     texttemplate=texttemplate,
        114     coloraxis="coloraxis")
        115 fig.append_trace(trace, 1, 1)
        116 if current_correlations is not None:
    
    File ~/.cache/pypoetry/virtualenvs/ds-model-monitoring-GQktnPqD-py3.9/lib/python3.9/site-packages/plotly/graph_objs/_heatmap.py:2949, in Heatmap.__init__(self, arg, autocolorscale, coloraxis, colorbar, colorscale, connectgaps, customdata, customdatasrc, dx, dy, hoverinfo, hoverinfosrc, hoverlabel, hoverongaps, hovertemplate, hovertemplatesrc, hovertext, hovertextsrc, ids, idssrc, legendgroup, legendgrouptitle, legendrank, meta, metasrc, name, opacity, reversescale, showlegend, showscale, stream, text, textfont, textsrc, texttemplate, transpose, uid, uirevision, visible, x, x0, xaxis, xcalendar, xgap, xhoverformat, xperiod, xperiod0, xperiodalignment, xsrc, xtype, y, y0, yaxis, ycalendar, ygap, yhoverformat, yperiod, yperiod0, yperiodalignment, ysrc, ytype, z, zauto, zhoverformat, zmax, zmid, zmin, zsmooth, zsrc, **kwargs)
       2947 _v = text if text is not None else _v
       2948 if _v is not None:
    -> 2949     self["text"] = _v
       2950 _v = arg.pop("textfont", None)
       2951 _v = textfont if textfont is not None else _v
    
    File ~/.cache/pypoetry/virtualenvs/ds-model-monitoring-GQktnPqD-py3.9/lib/python3.9/site-packages/plotly/basedatatypes.py:4827, in BasePlotlyType.__setitem__(self, prop, value)
       4823         self._set_array_prop(prop, value)
       4825     # ### Handle simple property ###
       4826     else:
    -> 4827         self._set_prop(prop, value)
       4828 else:
       4829     # Make sure properties dict is initialized
       4830     self._init_props()
    
    File ~/.cache/pypoetry/virtualenvs/ds-model-monitoring-GQktnPqD-py3.9/lib/python3.9/site-packages/plotly/basedatatypes.py:5171, in BasePlotlyType._set_prop(self, prop, val)
       5169         return
       5170     else:
    -> 5171         raise err
       5173 # val is None
       5174 # -----------
       5175 if val is None:
       5176     # Check if we should send null update
    
    File ~/.cache/pypoetry/virtualenvs/ds-model-monitoring-GQktnPqD-py3.9/lib/python3.9/site-packages/plotly/basedatatypes.py:5166, in BasePlotlyType._set_prop(self, prop, val)
       5163 validator = self._get_validator(prop)
       5165 try:
    -> 5166     val = validator.validate_coerce(val)
       5167 except ValueError as err:
       5168     if self._skip_invalid:
    
    File ~/.cache/pypoetry/virtualenvs/ds-model-monitoring-GQktnPqD-py3.9/lib/python3.9/site-packages/_plotly_utils/basevalidators.py:405, in DataArrayValidator.validate_coerce(self, v)
        403     v = to_scalar_or_list(v)
        404 else:
    --> 405     self.raise_invalid_val(v)
        406 return v
    
    File ~/.cache/pypoetry/virtualenvs/ds-model-monitoring-GQktnPqD-py3.9/lib/python3.9/site-packages/_plotly_utils/basevalidators.py:289, in BaseValidator.raise_invalid_val(self, v, inds)
        286             for i in inds:
        287                 name += "[" + str(i) + "]"
    --> 289         raise ValueError(
        290             """
        291     Invalid value of type {typ} received for the '{name}' property of {pname}
        292         Received value: {v}
        293 
        294 {valid_clr_desc}""".format(
        295                 name=name,
        296                 pname=self.parent_name,
        297                 typ=type_str(v),
        298                 v=repr(v),
        299                 valid_clr_desc=self.description(),
        300             )
        301         )
    
    ValueError: 
        Invalid value of type 'builtins.str' received for the 'text' property of heatmap
            Received value: ''
    
        The 'text' property is an array that may be specified as a tuple,
        list, numpy array, or pandas Series
    

    Thanks

    opened by chrisjclarke 5
  • Input for ProbClassificationPerformanceAnalyzer

    Input for ProbClassificationPerformanceAnalyzer

    I posted same question at discord first, but this maybe a more appropriate place. We'll see ;-)

    I'm writing some tests for dashboards and profiles but stumbled upon a probable bug in ProbClassificationPerformanceAnalyzer. https://github.com/burkovae/evidently/blob/ab-fix-classification-bugs/evidently/analyzers/test_classification_analyzers.py

    Since I'm not sure, how a correct input would look like, I hope that someone can post ist here. I will be happy with a very basic data for an analysis of a single dataframe. Hence, I hope to get some correct input, so that I can narrow down what's happening and what components fails.

    Especially I would like know, what the ProbClassificationPerformanceAnalyzer computes exactly (@emeli-dral it seems you wrote this piece). I would be happy with few inputs and expected outputs.

    If you already know the fix to

    Error
    Traceback (most recent call last):
      File "...\evidently\evidently\analyzers\test_classification_analyzers.py", line 21, in test_single_dataset_with_two_classes
        blubb = analyzer.calculate(df, None, ColumnMapping())
      File "...\evidently\analyzers\prob_classification_performance_analyzer.py", line 32, in calculate
        prediction_labels = [prediction_column[x] for x in prediction_ids]
    TypeError: 'numpy.intc' object is not iterable
    

    then I urge you to write the tests with inputs and expected outputs first, please. It helps down the line and improves robustness and safeguard against unwanted changes.

    If you can point me in the right direction, I will fix any issues along the road. I just need to know what you compute specifically with some explicit examples.

    opened by burkovae 5
  • Evaluating the classification quality of model

    Evaluating the classification quality of model

    Mathematically, the classification quality of the model is known as Matthews Correlation Coefficient.

    The metric is a true judge for the classification problems having an imbalanced set of classes.

    https://en.wikipedia.org/wiki/Matthews_correlation_coefficient

    enhancement 
    opened by xsansha 5
  • Attribute error when trying to pass column mapping

    Attribute error when trying to pass column mapping

    Here's what I was trying:

    data_drift_report = Dashboard(tabs=[DataDriftTab()])
    data_drift_report.calculate(reference, production_data, 
    column_mapping = {"target": "target", "prediction": None, "datetime": "datetime"})
    
    data_drift_report.show(mode="inline")
    

    The error: AttributeError: 'dict' object has no attribute 'datetime'

    Traceback

    ~/evidently/analyzers/utils.py in process_columns(dataset, column_mapping)
         85         # data mapping should not be empy in this step
         86         raise ValueError("column_mapping should be present")
    ---> 87     date_column = column_mapping.datetime if column_mapping.datetime in dataset else None
         88     # index column name
         89     id_column = column_mapping.id
         
    AttributeError: 'dict' object has no attribute 'datetime'
    

    Package versions: evidently-0.1.49.dev0 dataclasses-0.6

    bug 
    opened by aakash-dusane 4
  • Grafana dashboard detect data drift even though both reference and production data are same

    Grafana dashboard detect data drift even though both reference and production data are same

    I was experimenting with Evidently's Real-time ML monitoring with Grafana and getting data drift detection even though when both reference and production datasets are exactly same (production data is just the copy of reference data).

    Please find the below example: image

    Here is the preview of Reference data image

    Here is the preview of Production data image

    opened by alokrajg 4
  • added dash to docker compose to README.md and run_example.py script

    added dash to docker compose to README.md and run_example.py script

    Submitting a pull request to add a dash to the README.md under the grafana integration examples as well as the run_example.py. There is a dash that is needed between docker and compose for docker-compose. Not having the dash for docker compose makes it so that the run_example.py errors out before building out the grafana and Prometheus dashboard.

    opened by jeff-ridgeway 0
  • fix: Drop NaNs only in used columns

    fix: Drop NaNs only in used columns

    • Select columns used in CatTargetDriftAnalyzer and then filter Nans and infinities.
    • Stop doing the replace and drop of these values inplace.
    • Check that columns are not empty and raise a more informative error if so.

    Closes #241

    Signed-off-by: Daniel J. Morales Velásquez [email protected]

    opened by danieljmv01 2
  • Not used columns should not affect calculating Categorical Target Drift

    Not used columns should not affect calculating Categorical Target Drift

    CatTargetDriftAnalyzer only uses the target or the prediction columns, but it drops rows that contains NaNs in any column, even if that column is not used in this analyzer.

    Also if any used column is empty after dropping invalid values, the exception raised is not informative at all.

    I added a test to replicate the problem, here is the pytest output:

    ______________________ test_structure_no_drift_with_nulls ______________________
    
    analyzer = <evidently.analyzers.cat_target_drift_analyzer.CatTargetDriftAnalyzer object at 0x7f88ccf6af98>
    
        def test_drift_with_null_colums(analyzer: CatTargetDriftAnalyzer) -> None:
            """Test drift with columns with nulls.
        
            Test that not used columns with nulls does not change
            target drift.
            """
            data = {
                "target": ["a"] * 10 + ["b"] * 10,
                "foo": [1]*10 + [np.nan] * 10,
                "bar": [np.nan] * 10 + [1] * 10,
            }
            df1 = DataFrame(data)
            df2 = DataFrame(data)
        
    >       result = analyzer.calculate(df1, df2, ColumnMapping())
    
    tests/analyzers/test_categorical_target_drift_analyzer.py:217: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    src/evidently/analyzers/cat_target_drift_analyzer.py:154: in calculate
        reference_data, current_data, feature_type, target_column, target_test, threshold
    src/evidently/analyzers/cat_target_drift_analyzer.py:38: in _compute_statistic
        return stattest(reference_data[column_name], current_data[column_name], feature_type, threshold)
    src/evidently/analyzers/stattests/registry.py:28: in __call__
        self.default_threshold if threshold is None else threshold)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    reference_data = Series([], Name: target, dtype: object)
    current_data = Series([], Name: target, dtype: object), feature_type = 'cat'
    threshold = 0.05
    
        def _z_stat_test(
                reference_data: pd.Series,
                current_data: pd.Series,
                feature_type: str,
                threshold: float) -> Tuple[float, bool]:
            #  TODO: simplify ignoring NaN values here, in chi_stat_test and data_drift_analyzer
            if (reference_data.nunique() == 1
                    and current_data.nunique() == 1
                    and reference_data.unique()[0] == current_data.unique()[0]):
                p_value = 1
            else:
                keys = set(list(reference_data.unique()) + list(current_data.unique())) - {np.nan}
                ordered_keys = sorted(list(keys))
                p_value = proportions_diff_z_test(
                    proportions_diff_z_stat_ind(
    >                   reference_data.apply(lambda x, key=ordered_keys[0]: 0 if x == key else 1),
                        current_data.apply(lambda x, key=ordered_keys[0]: 0 if x == key else 1)
                    )
                )
    E           IndexError: list index out of range
    
    opened by danieljmv01 0
  • probability classification report - does not show the entire size of data

    probability classification report - does not show the entire size of data

    Hello, have been investigating your great library. However, after updating the tool and running a classification report I am noticing that the count of rows in the report is inconsistent with the data used for the calculation. I am not in the position of sharing the error but I hope is a quick thing to check on your side? I tried all the possible changes and debugging. I know for sure that the reference data has 2500 rows but the report only shows 480 records. Really not sure what else to check and any helps would be really appreciated.

    bug 
    opened by Guidosalimbeni 3
  • Error in DataDrift dashboard show() on databricks notebook

    Error in DataDrift dashboard show() on databricks notebook

    Hi Evidently team,

    I'm trying to show the datadrift dashboard I created on a databricks notebook. Using show(mode="inline") works but it also has a 20MB limit, in my case I can only use one integer column up to 75 reference samples and 75 current samples out of my 200k dataset. data_drift_dashboard.calculate(df[:75], df[75:150], column_mapping=None) See error screenshot below: Command result size exceeds limit: 20971520 bytes (current 20974461) Screen Shot 2022-05-11 at 6 06 33 PM

    I tried using the mode="auto and mode="nbextension" but I get this error: Uncaught ReferenceError: requirejs is not defined Screen Shot 2022-05-11 at 7 01 13 PM

    Using the colab method is not an option for me as I need to have it done on a databricks notebook. Thank you and please let me know if there's an existing answer for this already in the documentation or previous issues.

    Kind regards

    opened by shebna-kumu 1
Releases(v0.1.51.dev0)
  • v0.1.51.dev0(May 31, 2022)

    Updates:

    • Updated DataDriftTab: added target and prediction rows in DataDrift Table widget
    • Updated CatTargetDriftTab: added additional widgets for probabilistic cases in both binary and multiclasss probabilistic classification, particularly widget for label drift and class probability distributions.

    Fixes:

    • #233
    • fixed previes in DataDrift Table widget. Now histogram previews for refernce and current data share an x-axis. This means that bins order in refernce and current histograms is the same, it makes visual distribution comparion esier.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.50.dev0(May 19, 2022)

    Release scope:

    1. Stat test auto selection algorithm update: https://docs.evidentlyai.com/reports/data-drift#how-it-works

    For small data with <= 1000 observations in the reference dataset:

    • For numerical features (n_unique > 5): two-sample Kolmogorov-Smirnov test.
    • For categorical features or numerical features with n_unique <= 5: chi-squared test.
    • For binary categorical features (n_unique <= 2), we use the proportion difference test for independent samples based on Z-score. All tests use a 0.95 confidence level by default.

    For larger data with > 1000 observations in the reference dataset:

    1. Added options for setting custom statistical test for Categorical and Numerical Target Drift Dashboard/Profile: cat_target_stattest_func: Defines a custom statistical test to detect target drift in CatTargetDrift. num_target_stattest_func: Defines a custom statistical test to detect target drift in NumTargetDrift.

    2. Added options for setting custom threshold for drift detection for Categorical and Numerical Target Drift Dashboard/Profile: cat_target_threshold: Optional[float] = None num_target_threshold: Optional[float] = None These thresholds highly depends on selected stattest, generally it is either threshold for p_value or threshold for a distance.

    Fixes:
    #207

    Source code(tar.gz)
    Source code(zip)
  • v0.1.49.dev0(Apr 30, 2022)

    StatTests The following statistical tests now can be used for both numerical and categorical features:

    • 'jensenshannon'
    • 'kl_div'
    • 'psi

    Grafana monitoring example

    • Updated the example to be used with several ML models
    • Added monitors for NumTargetDrift, CatTargetDrift
    Source code(tar.gz)
    Source code(zip)
  • v0.1.48.dev0(Apr 13, 2022)

    Colour Scheme Support for custom colours in the Dashboards:

    • primary_color
    • secondary_color
    • current_data_color
    • reference_data_color
    • color_sequence
    • fill_color
    • zero_line_color
    • non_visible_color
    • underestimation_color
    • overestimation_color
    • majority_color

    Statistical Test: Support for user implemented statistical tests Support for custom statistical tests in Dashboards and Profiles Available tests:

    • 'ks'
    • 'z'
    • 'chisquare'
    • 'jensenshannon'
    • 'kl_div'
    • 'psi'
    • 'wasserstein' more info: docs

    Fixes: #193

    Source code(tar.gz)
    Source code(zip)
  • v0.1.47.dev0(Mar 23, 2022)

    Custom Text Comments in Dashboards

    • Added type type="text” for BaseWidgetInfo (for text widgets implementation)
    • Markdown syntax is supported

    see the example: https://github.com/evidentlyai/evidently/blob/main/examples/how_to_questions/text_widget_usage_iris.ipynb

    Source code(tar.gz)
    Source code(zip)
  • v0.1.46.dev0(Mar 12, 2022)

    • Data Quality Dashboard: add dataset overview widget
    • Data Quality Dashboard: add correlations widget
    • Speeded uploading via preview plots optimisation
    • Paging in Data Quality feature table widget
    Source code(tar.gz)
    Source code(zip)
  • v0.1.45.dev0(Feb 22, 2022)

    • DataQualityTab() is now available for Dashboards! The Tab helps to observe data columns, explore their properties and compare datasets.
    • DataQualityProfileSection() is available for Profiles as well.
    • ColumnMapping update: added task parameter to specify the type of machine learning problem. This parameter is used by DataQualityAnalyzer and some data quality widgets to calculate metrics and visuals with the respect to the task.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.44.dev0(Feb 14, 2022)

    • Added monitors for NumTargetDrift, CatTargetDrift, ClassificationPerformance, ProbClassificationPerformance
    • Fixed RegressionPerformance monitor
    • Supported strings as a categorical features in DataDrift and CatTargetDrift dashboards
    • Supported boolean features in DataDrift dashboard
    Source code(tar.gz)
    Source code(zip)
  • v0.1.43.dev0(Jan 31, 2022)

    Analyzers Refactoring: analyzer result became a structured object instead of a dictionary for all Analyzers

    The following Quality Metrics Options are added:

    • conf_interval_n_sigmas (the width of confidence intervals ): int = DEFAULT_CONF_INTERVAL_SIZE
    • classification_treshold (the threshold for true labels): float = DEFAULT_CLASSIFICATION_TRESHOLD
    • cut_quantile (cut the data by right, left and two-sided quantiles): Union[None, Tuple[str, float], Dict[str, Tuple[str, float]]] = None
    Source code(tar.gz)
    Source code(zip)
  • v0.1.42.dev0(Jan 24, 2022)

    Added backward compatibility for imports:

    • Widgets and Tabs can be imported from evidently directly, but this is deprecated behavior and cause warning
    • Sections can be imported from evidently directly, but this is deprecated behavior and cause warning
    Source code(tar.gz)
    Source code(zip)
  • v0.1.41.dev0(Jan 19, 2022)

    • Library source code is moved to the src/evidently folder
    • Docs, Tests, and Examples are moved to the top level of the repo
    • Widgets and Tabs are moved inside of the src/evidently/dashboard folder, as those are parts of the Dashboard
    • Sections are moved inside of the src/evidently/model_profile folder, as those are parts of the Model_profiles
    • Docs are stored in the repo: docs/book folder
    • DataDriftAnalyzer refactoring: analyzer results became a structured object instead of a dictionary
    Source code(tar.gz)
    Source code(zip)
  • v0.1.40.dev0(Dec 30, 2021)

    • fixed: input DataFrames cannot be changed during any calculations (fixed by making shallow copies)
    • fixed: chi-square statistical test uses normalized frequencies (with respect to the latest scipy version)
    • current dataset is optional for Performace Tabs and Sections calculation (None value can be passed)
    • improved readme
    Source code(tar.gz)
    Source code(zip)
  • v0.1.39.dev0(Dec 23, 2021)

    Data Drift Options:

    • Created confidence: Union[float, Dict[str, float]] - option can take a float or a dict as an argument. If float has passed, then this confidence level will be used for all features. If dict has passed, then specified features will have a custom confidence levels (all the rest will have default confidence level = 0.95)
    • Updated nbinsx: Union[int, Dict[str, int]] - option can take an int or a dict as an argument. If int has passed, then this number of bins will be used for all features. If dict has passed, then specified features will have a custom number of bins (all the rest will have default number of bins = 10)
    • Updated feature_stattest_func: Union[None, Callable, Dict[str, Callable]] - option can take a function or a dict as an argument. If a function has passed, then this function will be used to measure drift for all features. If dict has passed, then for specified features custom functions will be used (all the rest features will be processed by an internal algorithm of drift measurement)

    Package building:

    • Fixed dependencies
    Source code(tar.gz)
    Source code(zip)
  • v0.1.35.dev0(Dec 9, 2021)

    • Support widgets order for include_widgets parameter
    • Support an ability to add a custom widget to Tabs with include_widgets parameter
    • Moved options to a separate module
    • Added options to specify statistical tests for DataDrift and TargetDrift Dashboards: stattest_func - to set a custom statistical test for all the features feature_stattest_func - to set a custom statistical tests for each individual feature cat_target_stattest_func - to set a custom statistical test for categorical target num_target_stattest_func - to set a custom statistical test for numerical target
    • Refactored Widgets and Tabs for simpler customisation
    Source code(tar.gz)
    Source code(zip)
  • v0.1.33.dev0(Dec 1, 2021)

    • Supported custom list of Widgets for Tabs in Dashboard with help of verbose_level and include_widgets parameters
    • Added parameter verbose_level: 0 - to create a Tab with the shortest list of Widgets, 1 - to create a full Tab
    • Added parameter include_widgets: ["Widget Name 1", "Widget Name 2", etc]. This parameter overwrites verbose_level (if both are specified) and allows to set a custom list of Widgets
    • Added Tab.list_widgets() method to list all the available Widgets for the current Tab
    • Created Options entity to specify Widgets and Tabs customisable settings
    • Created ColumnMapping entity instead column_mapping python dictionary
    Source code(tar.gz)
    Source code(zip)
  • v0.1.32.dev0(Nov 25, 2021)

  • v0.1.31.dev0(Nov 21, 2021)

  • v0.1.30.dev0(Nov 12, 2021)

    1. Supported dashboard visualization in Google Colab
    2. Supported dashboard visualization in python Pylab
    3. Added a parameter mode for dashboard.show(), which can take the following options:
    • auto - the default option. Ideally, you will not need to specify the value for mode and use the default. But, if it does not work (in case we failed to determine the environment automatically), consider setting the correct value explicitly.
    • nbextention - to show the UI using nbextension. Use this option to display dashboards in jupyter notebooks (should work automatically).
    • inline - to insert the UI directly into the cell. Use this option for Google Colab, Kaggle Kernels and Deepnote. For Google Colab this should work automatically, for Kaggle Kernels and Deepnote option should be specified explicitly.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.28.dev0(Nov 10, 2021)

    • Supported dashboard visualization in Google Colab
    • Supported dashboard visualization in python Pylab
    • Added a parameter to switch on pylab visualization model: dashboard.show(mode='pylab')
    Source code(tar.gz)
    Source code(zip)
  • v0.1.27.dev0(Oct 13, 2021)

    • Added a way to calculate metrics with moving window
    • Added metrics logging to Prometheus
    • Added an example of Data Drift Monitoring with Prometheus and Grafana
    • Added a config for Data drift dashboard at Grafana
    Source code(tar.gz)
    Source code(zip)
  • v0.1.26.dev0(Oct 6, 2021)

  • v0.1.25.dev0(Sep 24, 2021)

    • Added the source code for the UI (now it can be built from the source)
    • Created utils.py with helper functions
    • Added config for Pylint
    • Added some unit tests
    Source code(tar.gz)
    Source code(zip)
  • 0.1.23.dev0(Sep 6, 2021)

    Added the following options to configure data drift report:

    • 'drift_conf_level' confidence level for the individual features (default value = 0.95)
    • 'drift_features_share' - share of the drifted features to detect dataset drift (default value = 0.5)
    • 'xbins' - the custom bins to plot in the datadrift table
    • 'nbinsx' - the custom number of bins to plot in the datadrift table

    If share of the features drifted at the 'drift_conf_level' confidence level is higher than the 'drift_features_share' threshold, than Dataset drift will be detected. Otherwise Dataset drift will not be detected.

    Source code(tar.gz)
    Source code(zip)
  • 0.1.22.dev0(Aug 2, 2021)

    • When you use Evidently in the command-line interface, we collect basic telemetry. It includes data on the environment (e.g. Python version) and usage (type of report or profile generated).
    • Our telemetry is intentionally limited in scope. We DO NOT collect any sensitive information and never see the data, its structure, or column names.
    • You can read more about what we collect here.
    • You can opt-out from telemetry collection by setting the environment variable EVIDENTLY_DISABLE_TELEMETRY=1
    Source code(tar.gz)
    Source code(zip)
  • 0.1.20.dev0(Jul 20, 2021)

    • Added proportion difference test for binary categorical target/prediction drift
    • Added proportion difference test for data drift (categorical features)
    Source code(tar.gz)
    Source code(zip)
  • 0.1.19.dev0(Jun 29, 2021)

    • Fixed Regression Performance Analyzer (underperformance segments table)
    • Fixed Prob Classification Performance Analyzer (precision-recall table)
    • Fixed Classification Performance Analyzer (Classification Confusion Based Feature Distribution Table)
    • Updated CatTargetDriftTab to use analyzers
    • Updated NumTargetDriftTab to use analyzers
    • Updated RegressionRerformanceTab to use analyzers
    • Updated ClassificationPerformanceTab to use analyzers
    • Updated ProbClassificationPerformanceTab to use Analyzers
    Source code(tar.gz)
    Source code(zip)
  • 0.1.18.dev0(Jun 18, 2021)

    • Sampling for large datasets: Sequential Sampling and Random Sampling
    • Changed names: Production -> Current literally everywhere
    • Fixed json serialisation issue
    • Updated Ranges for plots inside of the “Classification Quality By Feature” table in Probabilistic Classification Performance Dashboard
    Source code(tar.gz)
    Source code(zip)
  • v0.1.17.dev0(Jul 2, 2021)

    • Released Profiles
    • Added “calculate” step to separate Dashboard/Profile object creation from heavy calculation process.
    • Changed name: DriftTab -> DataDriftTab
    Source code(tar.gz)
    Source code(zip)
  • v0.1.15.dev0(Jul 2, 2021)

    • Bug fixes and dependency updates
    • Regression Performance Report: added a new plot to detect error bias per feature
    • Probabilistic Classification Report: added a new plot to help choose the decision threshold
    Source code(tar.gz)
    Source code(zip)
Owner
Evidently AI
Open-source tools to analyze, monitor, and debug machine learning models in production
Evidently AI
easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

Neuron AI 5 Jun 4, 2022
A demo project to elaborate how Machine Learn Models are deployed on production using Flask API

This is a salary prediction website developed with the help of machine learning, this makes prediction of salary on basis of few parameters like interview score, experience test score.

null 1 Feb 10, 2022
Production Grade Machine Learning Service

This project is made to help you scale from a basic Machine Learning project for research purposes to a production grade Machine Learning web service

Abdullah Zaiter 10 Apr 4, 2022
Machine learning that just works, for effortless production applications

Machine learning that just works, for effortless production applications

Elisha Yadgaran 17 Nov 9, 2021
ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. It has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstractions that are catered towards ML workflows.

ZenML 2k May 27, 2022
Iris-Heroku - Putting a Machine Learning Model into Production with Flask and Heroku

Puesta en Producción de un modelo de aprendizaje automático con Flask y Heroku L

Jesùs Guillen 1 Jun 3, 2022
Graphsignal is a machine learning model monitoring platform.

Graphsignal is a machine learning model monitoring platform. It helps ML engineers, MLOps teams and data scientists to quickly address issues with data and models as well as proactively analyze model performance and availability.

Graphsignal 117 May 1, 2022
Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

Amplo 10 May 15, 2022
ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

A library for debugging/inspecting machine learning classifiers and explaining their predictions

null 107 Jun 2, 2022
MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

The collaboration platform for Machine Learning MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine

MLReef 1.4k May 30, 2022
A simple example of ML classification, cross validation, and visualization of feature importances

Simple-Classifier This is a basic example of how to use several different libraries for classification and ensembling, mostly with sklearn. Example as

Rob 1 Oct 22, 2021
A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

?? Interactive Machine Learning experiments: ??️models training + ??models demo

Oleksii Trekhleb 1.3k Jun 8, 2022
Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis.

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

Facebook Research 3.7k May 28, 2022
A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

MLOps template with examples for Data pipelines, ML workflow management, API development and Monitoring.

Utsav 28 May 11, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.6k May 28, 2022
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 73 May 16, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbit 8k Jun 7, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 17 Apr 4, 2022
MIT-Machine Learning with Python–From Linear Models to Deep Learning

MIT-Machine Learning with Python–From Linear Models to Deep Learning | One of the 5 courses in MIT MicroMasters in Statistics & Data Science Welcome t

null 1 Oct 19, 2021