onelearn: Online learning in Python

Last update: Nov 06, 2022

Overview

onelearn: Online learning in Python

Documentation | Reproduce experiments |

onelearn stands for ONE-shot LEARNning. It is a small python package for online learning with Python. It provides :

online (or one-shot) learning algorithms: each sample is processed once, only a single pass is performed on the data
including multi-class classification and regression algorithms
For now, only ensemble methods, namely Random Forests

Installation

The easiest way to install onelearn is using pip

pip install onelearn

But you can also use the latest development from github directly with

pip install git+https://github.com/onelearn/onelearn.git

References

@article{mourtada2019amf,
  title={AMF: Aggregated Mondrian Forests for Online Learning},
  author={Mourtada, Jaouad and Ga{\"\i}ffas, St{\'e}phane and Scornet, Erwan},
  journal={arXiv preprint arXiv:1906.10529},
  year={2019}
}

Comments

Unable to pickle AMFClassifier.
I would like to save the AMFClassifier, but am unable to pickle it. I have also tried to use dill or joblib, but they also don't seem to work.

Is there maybe another way to somehow export the AMFClassifier in any way, such that I can save it and load it in another kernel?

Below I added a snippet of code which reproduces the error. Note that only after the partial_fit method an error occurs when pickling. When the AMFClassifier has not been fit yet, pickling happens without problems, however, exporting an empty model is pretty useless.

Any help or tips is much appreciated.

from onelearn import AMFClassifier import dill as pickle from sklearn import datasets iris = datasets.load_iris() X = iris.data y = iris.target amf = AMFClassifier(n_classes=3) dump = pickle.dumps(amf) amf = pickle.loads(dump) amf.partial_fit(X,y) dump = pickle.dumps(amf) amf = pickle.loads(dump)
opened by w-feijen 1
Move experiments of the paper in a experiments folder
Update the documentation

Explain that we must clone the repo

Move also the short experiments to a examples folder and build a sphinx gallery with it
enhancement
opened by stephanegaiffas 1
Add some extra tests
Test that batch versus online training leads to the exact same forest

Test the behavior of reserve_samples, with several calls to partial_fit to check that memory is correctly allocated and

tests
opened by stephanegaiffas 1
What if predict_proba receives a single sample

get_amf_decision_online amf.partial_fit(X_train[iteration - 1], y_train[iteration - 1]) File "/Users/stephanegaiffas/Code/onelearn/onelearn/forest.py", line 259, in partial_fit n_samples, n_features = X.shape

opened by stephanegaiffas 1
Improve coverage

A problem is that @jit functions don't work with coverage... a workaround is to disable using the NUMBA_DISABLE_JIT environment variable, but breaks the code that use @jitclass and .class_type.instance_type attributes
enhancement bug fix

opened by stephanegaiffas 1

Releases(v0.3)

v0.3(Sep 29, 2021)
This release adds the following improvements

AMFClassifier and AMFRegressor can be serialized to files (using internally pickle) using the save and load methods

Source code(tar.gz)
Source code(zip)
v0.2.0(Apr 6, 2020)
This release adds the following improvements

SampleCollection pre-allocates more samples instead of the bare minimum for faster computation

The playground can be launched from the library

A documentation on readthedocs

Faster computations and a lot of code cleaning

Unittests for python 3.6-3.8

Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository https://onelearn.readthedocs.io

Covid-polygraph - a set of Machine Learning-driven fact-checking tools

Covid-polygraph, a set of Machine Learning-driven fact-checking tools that aim to address the issue of misleading information related to COVID-19.

1 Apr 22, 2022

ArviZ is a Python package for exploratory analysis of Bayesian models

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics

1.3k Jan 05, 2023

Summer: compartmental disease modelling in Python

Summer: compartmental disease modelling in Python Summer is a Python-based framework for the creation and execution of compartmental (or "state-based"

6 May 13, 2022

Transform ML models into a native code with zero dependencies

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

2.3k Jan 03, 2023

Test symmetries with sklearn decision tree models

Test symmetries with sklearn decision tree models Setup Begin from an environment with a recent version of python 3. source setup.sh Leave the enviro

2 Jul 19, 2022

The unified machine learning framework, enabling framework-agnostic functions, layers and libraries.

The unified machine learning framework, enabling framework-agnostic functions, layers and libraries. Contents Overview In a Nutshell Where Next? Overv

8.2k Dec 31, 2022

A basic Ray Tracer that exploits numpy arrays and functions to work fast.

Python-Fast-Raytracer A basic Ray Tracer that exploits numpy arrays and functions to work fast. The code is written keeping as much readability as pos

393 Dec 27, 2022

Regularization and Feature Selection in Least Squares Temporal Difference Learning

Regularization and Feature Selection in Least Squares Temporal Difference Learning Description This is Python implementations of Least Angle Regressio

0 Jan 18, 2022

DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

27 Aug 19, 2022

SIMD-accelerated bitwise hamming distance Python module for hexidecimal strings

hexhamming What does it do? This module performs a fast bitwise hamming distance of two hexadecimal strings. This looks like: DEADBEEF = 1101111010101

12 Oct 14, 2022

CS 7301: Spring 2021 Course on Advanced Topics in Optimization in Machine Learning

141 Nov 10, 2022

Diabetes Prediction with Logistic Regression

Diabetes Prediction with Logistic Regression Exploratory Data Analysis Data Preprocessing Model & Prediction Model Evaluation Model Validation: Holdou

2 Oct 23, 2021

Create large-scale ML-driven multiscale simulation ensembles to study the interactions

MuMMI RAS v0.1 Released: Nov 16, 2021 MuMMI RAS is the application component of the MuMMI framework developed to create large-scale ML-driven multisca

4 Feb 16, 2022

Module for statistical learning, with a particular emphasis on time-dependent modelling

Operating system Build Status Linux/Mac Windows tick tick is a Python 3 module for statistical learning, with a particular emphasis on time-dependent

410 Dec 14, 2022

Titanic Traveller Survivability Prediction

The aim of the mini project is predict whether or not a passenger survived based on attributes such as their age, sex, passenger class, where they embarked and more.

0 Jan 20, 2022

A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

FEATURE ENGINEERING Business Problem: A data preprocessing and feature engineering script for a machine learning pipeline needs to be prepared. It is

7 Dec 18, 2021

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

6.2k Jan 01, 2023

Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort

2.3k Jan 04, 2023

A classification model capable of accurately predicting the price of secondhand cars

The purpose of this project is create a classification model capable of accurately predicting the price of secondhand cars. The data used for model building is open source and has been added to this

2 Sep 13, 2022

Real-time domain adaptation for semantic segmentation

Advanced-Machine-Learning This repository contains the code for the project Real

1 Jan 30, 2022

onelearn: Online learning in Python

Related tags

Overview

onelearn: Online learning in Python

Installation

References

Comments

Unable to pickle AMFClassifier.

Move experiments of the paper in a experiments folder

Add some extra tests

What if predict_proba receives a single sample

Improve coverage

Releases(v0.3)

v0.3(Sep 29, 2021)

v0.2.0(Apr 6, 2020)

Owner

Covid-polygraph - a set of Machine Learning-driven fact-checking tools

ArviZ is a Python package for exploratory analysis of Bayesian models

Summer: compartmental disease modelling in Python

Transform ML models into a native code with zero dependencies

Test symmetries with sklearn decision tree models

The unified machine learning framework, enabling framework-agnostic functions, layers and libraries.

A basic Ray Tracer that exploits numpy arrays and functions to work fast.

Regularization and Feature Selection in Least Squares Temporal Difference Learning

DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

SIMD-accelerated bitwise hamming distance Python module for hexidecimal strings

CS 7301: Spring 2021 Course on Advanced Topics in Optimization in Machine Learning

Diabetes Prediction with Logistic Regression

Create large-scale ML-driven multiscale simulation ensembles to study the interactions

Module for statistical learning, with a particular emphasis on time-dependent modelling

Titanic Traveller Survivability Prediction

A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort

A classification model capable of accurately predicting the price of secondhand cars

Real-time domain adaptation for semantic segmentation