Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

Overview
Comments
  • Quickstart clustering

    Quickstart clustering

    Adds quick start for clustering. Note that I had to make some changes to the tests and the elbow curve implementation since I found minor issues: hardcoded figure size, missing n_clusters in the title and hardcoded random seed.

    opened by edublancas 10
  • new ROC api added to plot

    new ROC api added to plot

    Describe your changes

    • New ROC API (inherits from Plot)
    • plot.ROC.__add__ added for generating overlapping curves
    • The old roc API is still supported

    Issue ticket number and link

    Closes #84

    Checklist before requesting a review

    • [x] I have performed a self-review of my code
    • [x] I have added thorough tests (when necessary).
    • [x] I have added the right documentation (when needed). Product update? If yes, write one line about this update.
    opened by yafimvo 8
  • minor changes to silhouette_plot

    minor changes to silhouette_plot

    I was going to release a new version with the silhouette_plot @neelasha23 but noticed a few things.

    Our convention is not to include the word plot in the function names (since they're all in the plot, module, can you rename them?

    silhouette_plot -> silhouette silhouette_plot_from_results -> silhouette_from_results

    Also, please include 0.8.3 as the version when this plots became available, in case anyone is using an older version. This way they'll know they have to update, you can add a .. versionadded:: in a Notes section in the plot's docstring

    https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#directive-versionadded

    FYI: @idomic

    opened by edublancas 7
  • Inconsistency in image comparison

    Inconsistency in image comparison

    The results of matplotlib's @image_comparison are a bit inconsistent sometimes (behaving differently in local vs CI). Maybe we can aim to build a custom utility for comparing images from plots.

    opened by neelasha23 7
  • Bug: Missing colab flag

    Bug: Missing colab flag

    on some of the stats calls, the colab flag is missing. This makes it difficult to understand how many of the users are actually in colab or just plain docker.

    opened by idomic 6
  • docs broken

    docs broken

    looks like @neelasha23's last PR broke the documentation because of a change in sklearn:

    PapermillExecutionError: 
    ---------------------------------------------------------------------------
    Exception encountered at "In [1]":
    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    Cell In[1], line 3
          1 import importlib
    ----> 3 from sklearn.datasets import load_boston
          4 from sklearn.model_selection import train_test_split
          5 from sklearn import metrics
    
    File ~/checkouts/readthedocs.org/user_builds/sklearn-evaluation/conda/latest/lib/python3.8/site-packages/sklearn/datasets/__init__.py:156, in __getattr__(name)
        105 if name == "load_boston":
        106     msg = textwrap.dedent(
        107         """
        108         `load_boston` has been removed from scikit-learn since version 1.2.
       (...)
        154         """
        155     )
    --> 156     raise ImportError(msg)
        157 try:
        158     return globals()[name]
    
    ImportError: 
    `load_boston` has been removed from scikit-learn since version 1.2.
    
    

    FYI @idomic

    opened by edublancas 5
  • Installing sklearn_evaluation

    Installing sklearn_evaluation

    I used "pip install sklearn-evaluation" to install this library in anaconda. All requirements exit but it does not install. When I want to import it, there is no library. When I run pip command to install it, it does not access to install, nor install anything.

    opened by AminShah69 5
  • Incompatibility with sklearn 0.20.0

    Incompatibility with sklearn 0.20.0

    Hi. I was trying to use this package with the up-to-dated version of scikit (0.20.0) but I did not understand how to do it. In particular, I was trying to use

    from sklearn_evaluation import plot plot.grid_search(gridCV.grid_scores_, change=change,kind='bar')

    but the member grid_scores_ does not exist any more (present till scikit 0.17) and has been substituted by cv_results_, which returns an object of different data type with respect to the former member. Is there an easy way to go on using this function by using the new cv_results_ in place of grid_scores_? Thank you.

    opened by mfaggin 5
  • refactor plots for better integration with tracker

    refactor plots for better integration with tracker

    In sklearn-evaluation 0.8.2, I introduced two new methods to the SQL experiment tracker: log_confusion_matrix and log_classification_report. These two methods allow users to store plots in the SQLite database and retrieve them later.

    However, unlike previous versions, we're not storing the actual plot in the database, but the statistics we need to re-create the plot. For example, to re-create a confusion matrix, we can store the numbers on each quadrant. The benefit of this approach is that we can serialize and unserialize the plots as objects and allow the user to combine them for better comparison. See this example.

    Enabling this involves several changes in the plotting code since we need to split the part that computes the statistics to display from the code that generates the plot, and this has to be performed for each plot (so far, only confusion matrix and classification report have been refactored)

    The purpose of this issue is to start refactoring other popular plots. We still need to support the old API (e.g., plot.confusion_matrix), but it should use the object-oriented API under the hood (e.g., plot.ConfusionMatrix)

    The next one we can implement is the ROC curve. All classes should behave similarly; here are some pointers:

    • the class constructor should take the data needed to generate the plot (fpr and tpr as returned by roc_curve)
    • No need to implement __sub__ - not applicable for ROC. just raise a NotImplementedError with an appropriate error message
    • __add__ should create a new plot with overlapping ROC curves. This translates into users being able to do roc1 + roc2 to generated the overlapping plot
    • the _get_data method should return the data needed to re-create the plot (example)
    • the from_dump class method should re-create a plot from a dumped json file (note that the dump method is implemented in the parent class
    opened by edublancas 4
  • SKLearnEvaluationLogger added

    SKLearnEvaluationLogger added

    Describe your changes

    SKLearnEvaluationLogger decorator wraps telemetry log_api functionality and allows to generate logs for sklearn-evaluation as follows:

    @SKLearnEvaluationLogger.log(feature='plot')
    def confusion_matrix(
            y_true,
            y_pred,
            target_names=None,
            normalize=False,
            cmap=None,
            ax=None,
            **kwargs):
    pass
    

    this will generate the following log:

            {
              "metadata": {
              "action": "confusion_matrix"
              "feature": "plot",
              "args": {
                            "target_names": "None",
                            "normalize": "False",
                            "cmap": "None",
                            "ax": "None"
                        }
              }
            }
    

    ** since y_true and y_pred are positional arguments without default values it won't log them

    we can also use pre-defined flags when calling a function

            return plot.confusion_matrix(self.y_true, self.y_pred, self.target_names, ax=_gen_ax())
    

    which will generate the following log:

            "metadata": {
                "action": "confusion_matrix"
                "feature": "plot",
                "args": {
                    "target_names": "['setosa', 'versicolor', 'virginica']",
                    "normalize": "False",
                    "cmap": "None",
                    "ax": "AxesSubplot(0.125,0.11;0.775x0.77)"
                }
            },
    

    Queries

    Run queries and filter out sklearn-evaluation events by the event name: sklearn-evaluation Break these events by feature ('plot', 'report', 'SQLiteTracker', 'NotebookCollection') Break events by actions (i.e: 'confusion_matrix', 'roc', etc...) and/or flags ('is_report')

    Errors

    Failing runnings will be named: sklearn-evaluation-error

    Checklist before requesting a review

    • [X] I have performed a self-review of my code
    • [X] I have added thorough tests (when necessary).
    • [] I have added the right documentation (when needed). Product update? If yes, write one line about this update.
    opened by yafimvo 4
  • GridSearch heatmap for 'None' parameter

    GridSearch heatmap for 'None' parameter

    When I try to generate a heatmap for GridSearchCV results, if the parameter has 'None' type, it gives error: TypeError: '<' not supported between instances of 'NoneType' and 'int'

    The parameter can be, for e.g. max_depth_for_decision_trees = [3, 5, 10, None].

    Is there any workaround for this?

    opened by shrsulav 4
  • doc intro is empty

    doc intro is empty

    our intro page is empty: https://sklearn-evaluation.ploomber.io/en/latest/intro.html

    we should briefly describe the features in the library (possibly with some short examples) and add links to our quick starts

    opened by edublancas 1
  • ConfusionMatrix fix.

    ConfusionMatrix fix.

    Adresses #145

    Restructured ConfusionMatrix class to include a plot method that plots data and axes to a matplotlib figure and returns a ConfusionMatrix class object. An object is returned so as to not break the addition and subtraction functions in the class. The figure is a matplotlib object and can be resized using matplotlib methods. The figure is accessed by the figure attribute of the class instance.

    Example:

    tree_cm = plot.ConfusionMatrix.from_raw_data(y_test, tree_pred, normalize=False) # Creates a ConfusionMatrix class instance tree_cm.figure.set_size_inches(5,5) # Resizes the figure to 5 by 5 inches tree_cm.figure # Outputs the figure contained in class instance

    opened by digithed 1
  • documenting alternatives to elbow curve

    documenting alternatives to elbow curve

    I came across this paper, which suggests that the elbow method isn't the best for choosing the number of clusters. We should give it a read, look for other sources and incorporate some of this advice in our elbow curve documentation. We could implement the alternatives.

    opened by edublancas 0
  • Prediction error plot - issue in logic

    Prediction error plot - issue in logic

    The prediction error piece has this logic: model.fit(y_reshaped, y_pred). This looks incorrect. It's trying to fit 2 sets of y values whereas it should fit (X,y). Need to understand why this statement is here and rectify accordingly.

    opened by neelasha23 0
Releases(0.5.6)
Owner
Eduardo Blancas
Developing tools for reproducible Data Science.
Eduardo Blancas
LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms

LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms Based on the work by Smith et al. (2021) Query

5 Aug 06, 2022
A Python toolkit for rule-based/unsupervised anomaly detection in time series

Anomaly Detection Toolkit (ADTK) Anomaly Detection Toolkit (ADTK) is a Python package for unsupervised / rule-based time series anomaly detection. As

Arundo Analytics 888 Dec 30, 2022
Polyglot Machine Learning example for scraping similar news articles.

Polyglot Machine Learning example for scraping similar news articles In this example, we will see how we can work with Machine Learning applications w

MetaCall 15 Mar 28, 2022
dirty_cat is a Python module for machine-learning on dirty categorical variables.

dirty_cat dirty_cat is a Python module for machine-learning on dirty categorical variables.

637 Dec 29, 2022
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022
A flexible CTF contest platform for coming PKU GeekGame events

Project Guiding Star: the Backend A flexible CTF contest platform for coming PKU GeekGame events Still in early development Highlights Not configurabl

PKU GeekGame 14 Dec 15, 2022
hgboost - Hyperoptimized Gradient Boosting

hgboost is short for Hyperoptimized Gradient Boosting and is a python package for hyperparameter optimization for xgboost, catboost and lightboost using cross-validation, and evaluating the results o

Erdogan Taskesen 34 Jan 03, 2023
This is the code repository for LRM Stochastic watershed model.

LRM-Squannacook Input data for generating stochastic streamflows are observed and simulated timeseries of streamflow. their format needs to be CSV wit

1 Feb 14, 2022
Merlion: A Machine Learning Framework for Time Series Intelligence

Merlion is a Python library for time series intelligence. It provides an end-to-end machine learning framework that includes loading and transforming data, building and training models, post-processi

Salesforce 2.8k Jan 05, 2023
Provide an input CSV and a target field to predict, generate a model + code to run it.

automl-gs Give an input CSV file and a target field you want to predict to automl-gs, and get a trained high-performing machine learning or deep learn

Max Woolf 1.8k Jan 04, 2023
Fourier-Bayesian estimation of stochastic volatility models

fourier-bayesian-sv-estimation Fourier-Bayesian estimation of stochastic volatility models Code used to run the numerical examples of "Bayesian Approa

15 Jun 20, 2022
Python bindings for MPI

MPI for Python Overview Welcome to MPI for Python. This package provides Python bindings for the Message Passing Interface (MPI) standard. It is imple

MPI for Python 604 Dec 29, 2022
Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogramas anuais com spark, em pyspark e SQL!

Olá! Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogr

Henrique de Paula 10 Apr 04, 2022
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Little Ball of Fur is a graph sampling extension library for Python. Please look at the Documentation, relevant Paper, Promo video and External Resour

Benedek Rozemberczki 619 Dec 14, 2022
Code for the TCAV ML interpretability project

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) Been Kim, Martin Wattenberg, Justin Gilmer, C

552 Dec 27, 2022
MLflow App Using React, Hooks, RabbitMQ, FastAPI Server, Celery, Microservices

Katana ML Skipper This is a simple and flexible ML workflow engine. It helps to orchestrate events across a set of microservices and create executable

Tom Xu 8 Nov 17, 2022
Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets Datasets Used: Iris dataset,

Samrat Mitra 2 Nov 18, 2021
Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

Repository Status for Scikit-learn Live webpage Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn. Running local

Thomas J. Fan 6 Dec 27, 2022
Markov bot - A Writing bot based on Markov Chain for Data Structure Lab

基于马尔可夫链的写作机器人 前端 用html/css完成 Demo展示(已给出文本的相应展示) 用户提供相关的语料库后训练的成果 后端 要完成的几个接口 解析文

DysprosiumDy 9 May 05, 2022
Code base of KU AIRS: SPARK Autonomous Vehicle Team

KU AIRS: SPARK Autonomous Vehicle Project Check this link for the blog post describing this project and the video of SPARK in simulation and on parkou

Mehmet Enes Erciyes 1 Nov 23, 2021