Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Overview

MACEst (Model Agnostic Confidence Estimator)

What is MACEst?

MACEst is a confidence estimator that can be used alongside any model (regression or classification) which uses previously seen data (i.e. any supervised learning model) to produce a point prediction.

In the regression case, MACEst produces a confidence interval about the point prediction, e.g. "the point prediction is 10 and I am 90% confident that the prediction lies between 8 and 12."

In Classification MACEst produces a confidence score for the point prediction. e.g. the point prediction is class 0 and I am 90% sure that the prediction is correct.

MACEst produces well-calibrated confidence estimates, i.e. 90% confidence means that you will on average be correct 90% of the time. It is also aware of the model limitations i.e. when a model is being asked to predict a point which it does not have the necessary knowledge (data) to predict confidently. In these cases MACEst is able to incorporate the (epistemic) uncertainty due to this and return a very low confidence prediction (in regression this means a large prediction interval).

Why use MACEst ?

Machine learning has become an integral part of many of the tools that are used every day. There has been a huge amount of progress on improving the global accuracy of machine learning models but calculating how likely a single prediction is to be correct has seen considerably less progress.

Most algorithms will still produce a prediction, even if this is in a part of the feature space the algorithm has no information about. This could be because the feature vector is unlike anything seen during training, or because the feature vector falls in a part of the feature space where there is a large amount of uncertainty such as if the border between two classes overlaps. In cases like this the prediction may well be meaningless. In most models, it is impossible to distinguish this sort of meaningless prediction from a sensible prediction. MACEst addresses this situation by providing an additional confidence estimate.

In some areas such as Finance, Infrastructure, or Healthcare, making a single bad prediction can have major consequences. It is important in these situations that a model is able to understand how likely any prediction it makes is to be correct before acting upon it. It is often even more important in these situations that any model knows what it doesn't know so that it will not blindly make bad predictions.

Summary of the Methodology

TL;DR

MACEst produces confidence estimates for a given point x by considering two factors:

  1. How accurate is the model when predicting previously seen points that are similar to x? Less confident if the model is less accurate in the region close to x.
  2. How similar is x to the points that we have seen previously? Less confident if x is not similar to the data used to train the model.

Longer Explanation

MACEst seeks to provide reliable confidence estimates for both regression and classification. It draws from ideas present in trust scores, conformal learning, Gaussian processes, and Bayesian modelling.

The general idea is that confidence is a local quantity. Even when the model is accurate globally, there are likely still some predictions about which it should not be very confident. Similarly, if the model is not accurate globally, there may still be some predictions for which the model can be very confident about.

To model this local confidence for a given prediction on a point x, we define the local neighbourhood by finding the k nearest neighbours to x. We then attempt to directly model the two causes of uncertainty, these are:

  1. Aleatoric Uncertainty: Even with lots of (possibly infinite) data there will be some variance/noise in the predictions. Our local approximation to this will be to define a local accuracy estimate. i.e. for the k nearest neighbours how accurate were the predictions?
  2. Epistemic Uncertainty: The model can only know relationships learnt from the training data. If the model has not seen any data point similar to x then it does not have as much knowledge about points like x, therefore the confidence estimate should be lower. MACEst estimates this by calculating how similar x is to the k nearest (most similar) points that it has previously seen.

We define a simple parametric function of these two quantities and calibrate this function so that our confidence estimates approximate the empirical accuracy, i.e. 90% confident -> 90% correct on average. By directly modelling these two effects, MACEst estimates are able to encapsulate the local variance accurately whilst also being aware of when the model is being asked to predict a point that is very different to what it has been trained on. This will make it robust to problems such as overconfident extrapolations and out of sample predictions.

Example

If a model has been trained to classify images of cats and dogs, and we want to predict an image of a poodle, we find the k most poodle-like cats and the k most poodle-like dogs. We then calculate how accurate the model was on these sets of images, and how similar the poodle is to each of these k cats and k dogs. We combine these two to produce a confidence estimate for each class.

As the poodle-like cats will likely be strange cats, they will be harder to classify and the accuracy will be lower for these than the poodle-like dogs this combined with the fact that image will be considerably more similar to poodle-like dogs the confidence of the dog prediction will be high.

If we now try to classify an image of a horse, we find that the new image is very dissimilar to both cats and dogs, so the similarity term dominates and the model will return an approximately uniform distribution, this can be interpreted as MACEst saying "I don't know what this is because I've never seen an image of a horse!".

Getting Started

To install MACEst run the following cmd:

pip install macest

Or add macest to your project's requirements.txt file as a dependency.

Software Prerequisites

To import and use MACEst we recommend Python version >= 3.6.8.

Basic Usage

Below shows examples of using MACEst for classification and regression. For more examples, and advanced usage, please see the example notebooks.

Classification

To use MACEst for a classification task, the following example can be used:

   import numpy as np
   from macest.classification import models as cl_mod
   from sklearn.ensemble import RandomForestClassifier
   from sklearn import datasets
   from sklearn.model_selection import train_test_split

   X,y = datasets.make_circles(n_samples= 2 * 10**4, noise = 0.4, factor =0.001)

   X_pp_train, X_conf_train, y_pp_train, y_conf_train  = train_test_split(X,
                                                                          y,
                                                                          test_size=0.66,
                                                                          random_state=10)

   X_conf_train, X_cal, y_conf_train, y_cal = train_test_split(X_conf_train,
                                                               y_conf_train,
                                                               test_size=0.5,
                                                               random_state=0)

   X_cal, X_test, y_cal,  y_test, = train_test_split(X_cal,
                                                     y_cal,
                                                     test_size=0.5,
                                                     random_state=0)

   point_pred_model = RandomForestClassifier(random_state =0,
                                             n_estimators =800,
                                             n_jobs =-1)

   point_pred_model.fit(X_pp_train,
                        y_pp_train)

   macest_model = cl_mod.ModelWithConfidence(point_pred_model,
                                          X_conf_train,
                                          y_conf_train)

   macest_model.fit(X_cal, y_cal)

   conf_preds = macest_model.predict_confidence_of_point_prediction(X_test)

Regression

To use MACEst for a regression task, the following example can be used:

   import numpy as np
   from macest.regression import models as reg_mod
   from sklearn.linear_model import LinearRegression
   from sklearn.model_selection import train_test_split

   X = np.linspace(0,1,10**3)
   y = np.zeros(10**3)
   y = 2*X*np.sin(2 *X)**2 + np.random.normal(0 , 1 , len(X))

   X_pp_train, X_conf_train, y_pp_train, y_conf_train  = train_test_split(X,
                                                                          y,
                                                                          test_size=0.66,
                                                                          random_state=0)

   X_conf_train, X_cal, y_conf_train, y_cal = train_test_split(X_conf_train, y_conf_train,
                                                            test_size=0.5, random_state=1)

   X_cal, X_test, y_cal,  y_test, =  train_test_split(X_cal,
                                                      y_cal,
                                                      test_size=0.5,
                                                      random_state=1)

   point_pred_model = LinearRegression()
   point_pred_model.fit(X_pp_train[:,None], y_pp_train)

   preds = point_pred_model.predict(X_conf_train[:,None])
   test_error = abs(preds - y_conf_train)
   y_conf_train_var = np.var(train_error)

   macest_model = reg_mod.ModelWithPredictionInterval(point_pred_model,
                                                    X_conf_train[:,None],
                                                    test_error)

   macest_model.fit(X_cal[:,None], y_cal)
   conf_preds = confidence_model.predict_interval(X_test, conf_level=90)

MACEst with sparse data (see notebooks for more details)

import scipy
from scipy.sparse import csr_matrix 
from scipy.sparse import random as sp_rand
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from macest.classification import models as clmod
import nmslib 

n_rows = 10**3
n_cols = 5 * 10**3
X = csr_matrix(sp_rand(n_rows, n_cols))
y = np.random.randint(0, 2, n_rows)

X_pp_train, X_conf_train, y_pp_train, y_conf_train = train_test_split(X, y, test_size=0.66, random_state=10)
X_conf_train, X_cal, y_conf_train, y_cal = train_test_split(X_conf_train, y_conf_train,
                                                            test_size=0.5, random_state=0)
X_cal, X_test, y_cal,  y_test, = train_test_split(X_cal, y_cal, test_size=0.5, random_state=0)

model = RandomForestClassifier(random_state=0,
                               n_estimators=800,
                               n_jobs=-1)

model.fit(csr_matrix(X_pp_train), y_pp_train)

param_bounds = clmod.SearchBounds(alpha_bounds=(0, 500), k_bounds=(5, 15))
neighbour_search_params = clmod.HnswGraphArgs(query_args=dict(ef=1100),
                                              init_args=dict(method="hnsw",
                                                             space="cosinesimil_sparse",
                                                             data_type=nmslib.DataType.SPARSE_VECTOR))
macest_model = clmod.ModelWithConfidence(model,
                                       X_conf_train,
                                       y_conf_train,
                                       search_method_args=neighbour_search_params)

macest_model.fit(X_cal, y_cal)

macest_point_prediction_conf = macest_model.predict_confidence_of_point_prediction(X_test)

Contributing

See the CONTRIBUTING.md file for information about contributing to MACEst.

Related Publications

For more information about the underlying methodology behind MACEst, then please refer to our accompanying research paper that has been shared on arXiv:

License

Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.

This library is licensed under Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl

See LICENSE.txt for more details.

Comments
  • Pandas deprecation error

    Pandas deprecation error

    I have installed Macest in an ad hoc Anaconda environment to make sure all the requirements meet the exact version required. This includes pandas==1.0.3

    However, I am not able to make it work with Pandas dataframes due to a deprecation error:

    File "test_macest.py", line 48, in macest_model.fit(X_cal, y_cal) File "/home/mirix/anaconda3/envs/macest/lib/python3.8/site-packages/macest/classification/models.py", line 397, in fit train_helper.fit(optimiser_args=optimiser_args) File "/home/mirix/anaconda3/envs/macest/lib/python3.8/site-packages/macest/classification/models.py", line 625, in fit point_preds[conflicts] == self.y_cal[conflicts] File "/home/mirix/anaconda3/envs/macest/lib/python3.8/site-packages/pandas/core/series.py", line 910, in getitem return self._get_with(key) File "/home/mirix/anaconda3/envs/macest/lib/python3.8/site-packages/pandas/core/series.py", line 943, in _get_with return self.loc[key] File "/home/mirix/anaconda3/envs/macest/lib/python3.8/site-packages/pandas/core/indexing.py", line 1768, in getitem return self._getitem_axis(maybe_callable, axis=axis) File "/home/mirix/anaconda3/envs/macest/lib/python3.8/site-packages/pandas/core/indexing.py", line 1954, in _getitem_axis return self._getitem_iterable(key, axis=axis) File "/home/mirix/anaconda3/envs/macest/lib/python3.8/site-packages/pandas/core/indexing.py", line 1595, in _getitem_iterable keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False) File "/home/mirix/anaconda3/envs/macest/lib/python3.8/site-packages/pandas/core/indexing.py", line 1552, in _get_listlike_indexer self._validate_read_indexer( File "/home/mirix/anaconda3/envs/macest/lib/python3.8/site-packages/pandas/core/indexing.py", line 1654, in _validate_read_indexer raise KeyError( KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'

    opened by mirix 1
  • Bump numpy from 1.19.0 to 1.22.0

    Bump numpy from 1.19.0 to 1.22.0

    Bumps numpy from 1.19.0 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Epistemic and aleatoric uncertainty sample code

    Epistemic and aleatoric uncertainty sample code

    Hi all,

    I was trying the example codes, but I'm struggling to find the piece where epistemic and aleatoric uncertainty are outputted somehow from the Macest model. I tried the method 'macest_model.predict_confidence_of_point_prediction(X_test)' but it just returns the class probability of the predicted class.

    Can you provide some code examples on how to extract epistemic and aleatoric uncertainty?

    Thanks

    Edit:

    Ok, I solved it :D 👍

    opened by federicocau 0
  • Bump setuptools from 41.2.0 to 65.5.1

    Bump setuptools from 41.2.0 to 65.5.1

    Bumps setuptools from 41.2.0 to 65.5.1.

    Release notes

    Sourced from setuptools's releases.

    v65.5.1

    No release notes provided.

    v65.5.0

    No release notes provided.

    v65.4.1

    No release notes provided.

    v65.4.0

    No release notes provided.

    v65.3.0

    No release notes provided.

    v65.2.0

    No release notes provided.

    v65.1.1

    No release notes provided.

    v65.1.0

    No release notes provided.

    v65.0.2

    No release notes provided.

    v65.0.1

    No release notes provided.

    v65.0.0

    No release notes provided.

    v64.0.3

    No release notes provided.

    v64.0.2

    No release notes provided.

    v64.0.1

    No release notes provided.

    v64.0.0

    No release notes provided.

    v63.4.3

    No release notes provided.

    v63.4.2

    No release notes provided.

    ... (truncated)

    Changelog

    Sourced from setuptools's changelog.

    v65.5.1

    Misc ^^^^

    • #3638: Drop a test dependency on the mock package, always use :external+python:py:mod:unittest.mock -- by :user:hroncok
    • #3659: Fixed REDoS vector in package_index.

    v65.5.0

    Changes ^^^^^^^

    • #3624: Fixed editable install for multi-module/no-package src-layout projects.
    • #3626: Minor refactorings to support distutils using stdlib logging module.

    Documentation changes ^^^^^^^^^^^^^^^^^^^^^

    • #3419: Updated the example version numbers to be compliant with PEP-440 on the "Specifying Your Project’s Version" page of the user guide.

    Misc ^^^^

    • #3569: Improved information about conflicting entries in the current working directory and editable install (in documentation and as an informational warning).
    • #3576: Updated version of validate_pyproject.

    v65.4.1

    Misc ^^^^

    v65.4.0

    Changes ^^^^^^^

    v65.3.0

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies OCA Verified 
    opened by dependabot[bot] 0
  • How to use MACEst in problem of face similarity?

    How to use MACEst in problem of face similarity?

    Hi. I'm working on a problem of face recognition. We inspect a selfy-with-passport pictures and detect faces on it. The task is to estimate probability that 2 detected faces from an image belong to one person (or not).

    The trivial solution is to encode each face into some vector (I use feature extracting network that outputs vectors of 512 values for each face) and then calculate cosine similarity along these vectors. This metric usually provides values around 0.35-0.45 for the same person's faces. When comparing these faces with faces of other people we get lower similarity, as expected, but not much lower.

    The point is that I want to move from such obscure values of ~0.35 to any kind of probability with ~0.9+ for faces of the same person and with significantly lower values for different persons. Obviously, this is not a regular classification problem because we have (potentially) infinite number of "classes" (persons). This is closer to regression problem but I don't know how to handle it.

    At this moment I have a trained face encoder and a database of ~6k face crops from ~4.5k persons. Some persons have more than 1 face in this db, but some - just one. And each new picture I get almost always contains a new person.

    1. Is MACEst applicable at this problem?
    2. How to use it if so?

    Any suggestions would be appreciated! Thanks

    P.S. I understand that cosine similarity produces values in -1..1 range, but simple translation to 0..1 range just shifts 0.35->0.675 which is not enough.

    opened by Ivan-basis 0
  • feat: ability to use only precomputed point predictions

    feat: ability to use only precomputed point predictions

    This PR adds the ability to train and use a macest model only from precomputed point predictions. This allows to have no reference to the point predcition model in the macest model, which makes the loading/saving and usage of these models easier. The following changes are introduced:

    • the model or point_pred_model argument in __init__ becomes optional
    • an optional parameter prec_point_preds is added to ModelWithConfidence.fit, ModelWithConfidence.predict_confidence_of_point_prediction, ModelWithPredictionInterval.predict_interval and ModelWithPredictionInterval.fit
    • an optional parameter update_empirical_conflict_constant is added to _TrainingHelper.fit. It allows to skip the step find_conflicting_predictions

    Signed-off-by: Florent Rambaud [email protected]

    opened by FlorentRamb 2
  • fix: choose lighter dtype for distance and error stored values

    fix: choose lighter dtype for distance and error stored values

    This PR tries to solve the following problem: When traning macest on a large dataset, the step of pre-computing distances and errors requires a lot of memory. A simple solution would be to use lighter dtypes: float32 for distances and bool for errors

    Signed-off-by: Florent Rambaud [email protected]

    opened by FlorentRamb 0
Releases(1.0.0)
  • 1.0.0(Aug 17, 2021)

    This release includes:

    • Functionality for calibrating classification and regression models
    • Notebooks to demonstrate calibration functionality and usage
    • Unit tests for the calibration code
    Source code(tar.gz)
    Source code(zip)
Owner
Oracle
Open Source at Oracle
Oracle
Implementation of linesearch Optimization Algorithms in Python

Nonlinear Optimization Algorithms During my time as Scientific Assistant at the Karlsruhe Institute of Technology (Germany) I implemented various Opti

Paul 3 Dec 06, 2022
Forecast dynamically at scale with this unique package. pip install scalecast

🌄 Scalecast: Dynamic Forecasting at Scale About This package uses a scaleable forecasting approach in Python with common scikit-learn and statsmodels

Michael Keith 158 Jan 03, 2023
Implementation of different ML Algorithms from scratch, written in Python 3.x

Implementation of different ML Algorithms from scratch, written in Python 3.x

Gautam J 393 Nov 29, 2022
Polyglot Machine Learning example for scraping similar news articles.

Polyglot Machine Learning example for scraping similar news articles In this example, we will see how we can work with Machine Learning applications w

MetaCall 15 Mar 28, 2022
Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

FFT-accelerated Interpolation-based t-SNE (FIt-SNE) Introduction t-Stochastic Neighborhood Embedding (t-SNE) is a highly successful method for dimensi

Kluger Lab 547 Dec 21, 2022
Practical Time-Series Analysis, published by Packt

Practical Time-Series Analysis This is the code repository for Practical Time-Series Analysis, published by Packt. It contains all the supporting proj

Packt 325 Dec 23, 2022
Kalman filter library

The kalman filter framework described here is an incredibly powerful tool for any optimization problem, but particularly for visual odometry, sensor fusion localization or SLAM.

comma.ai 276 Jan 01, 2023
A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

Stox A Module to predict the "close price" for the next day and give "technical analysis". It uses a Neural Network and the LSTM algorithm to predict

Stox 31 Dec 16, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 05, 2023
All-in-one web-based development environment for machine learning

All-in-one web-based development environment for machine learning Getting Started • Features & Screenshots • Support • Report a Bug • FAQ • Known Issu

3 Feb 03, 2021
AutoX是一个高效的自动化机器学习工具,它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色、简单易用、通用、自动化、灵活。

English | 简体中文 AutoX是什么? AutoX一个高效的自动化机器学习工具,它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色: AutoX在多个kaggle数据集上,效果显著优于其他解决方案(见效果对比)。 简单易用: AutoX的接口和sklearn类似,方便上手使用。

4Paradigm 431 Dec 28, 2022
This is a Machine Learning model which predicts the presence of Diabetes in Patients

Diabetes Disease Prediction This is a machine Learning mode which tries to determine if a person has a diabetes or not. Data The dataset is in comma s

Edem Gold 4 Mar 16, 2022
NumPy-based implementation of a multilayer perceptron (MLP)

My own NumPy-based implementation of a multilayer perceptron (MLP). Several of its components can be tuned and played with, such as layer depth and size, hidden and output layer activation functions,

1 Feb 10, 2022
A simple and lightweight genetic algorithm for optimization of any machine learning model

geneticml This package contains a simple and lightweight genetic algorithm for optimization of any machine learning model. Installation Use pip to ins

Allan Barcelos 8 Aug 10, 2022
My capstone project for Udacity's Machine Learning Nanodegree

MLND-Capstone My capstone project for Udacity's Machine Learning Nanodegree Lane Detection with Deep Learning In this project, I use a deep learning-b

Michael Virgo 407 Dec 12, 2022
A collection of neat and practical data science and machine learning projects

Data Science A collection of neat and practical data science and machine learning projects Explore the docs » Report Bug · Request Feature Table of Co

Will Fong 2 Dec 10, 2021
A comprehensive repository containing 30+ notebooks on learning machine learning!

A comprehensive repository containing 30+ notebooks on learning machine learning!

Jean de Dieu Nyandwi 3.8k Jan 09, 2023
ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

Broad Institute 65 Dec 20, 2022
A toolkit for geo ML data processing and model evaluation (fork of solaris)

An open source ML toolkit for overhead imagery. This is a beta version of lunular which may continue to develop. Please report any bugs through issues

Ryan Avery 4 Nov 04, 2021
As we all know the BGMI Loot Crate comes with so many resources for the gamers, this ML Crate will be the hub of various ML projects which will be the resources for the ML enthusiasts! Open Source Program: SWOC 2021 and JWOC 2022.

Machine Learning Loot Crate 💻 🧰 🔴 Welcome contributors! As we all know the BGMI Loot Crate comes with so many resources for the gamers, this ML Cra

Abhishek Sharma 89 Dec 28, 2022