InfiniteBoost: building infinite ensembles with gradient descent

Last update: Jan 03, 2023

Overview

InfiniteBoost

Code for a paper
InfiniteBoost: building infinite ensembles with gradient descent (arXiv:1706.01109).
A. Rogozhnikov, T. Likhomanenko

Description

InfiniteBoost is an approach to building ensembles which combines best sides of random forest and gradient boosting.

Trees in the ensemble encounter mistakes done by previous trees (as in gradient boosting), but due to modified scheme of encountering contributions the ensemble converges to the limit, thus avoiding overfitting (just as random forest).

Left: InfiniteBoost with automated search of capacity vs gradient boosting with different learning rates (shrinkages), right: random forest vs InfiniteBoost with small capacities.

More plots of comparison in research notebooks and in research/plots directory.

Reproducing research

Research is performed in jupyter notebooks (if you're not familiar, read why Jupyter notebooks are awesome).

You can use the docker image arogozhnikov/pmle:0.01 from docker hub. Dockerfile is stored in this repository (ubuntu 16 + basic sklearn stuff).

To run the environment (sudo is needed on Linux):

sudo docker run -it --rm -v /YourMountedDirectory:/notebooks -p 8890:8890 arogozhnikov/pmle:0.01

(and open localhost:8890 in your browser).

InfiniteBoost package

Self-written minimalistic implementation of trees as used for experiments against boosting.

Specific implementation was used to compare with random forest and based on the trees from scikit-learn package.

Code written in python 2 (expected to work with python 3, but not tested), some critical functions in fortran, so you need gfortran + openmp installed before installing the package (or simply use docker image).

pip install numpy
pip install .
# testing (optional)
cd tests && nosetests .

You can use implementation of trees from the package for your experiments, in this case please cite InfiniteBoost paper.

InfiniteBoost: building infinite ensembles with gradient descent

Related tags

Overview

InfiniteBoost

Description

Reproducing research

InfiniteBoost package

Owner

Alex Rogozhnikov

Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Cool Python features for machine learning that I used to be too afraid to use. Will be updated as I have more time / learn more.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn

Predicting diabetes over a five year period using logistic regression and the Pima First-Nation dataset

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

MachineLearningStocks is designed to be an intuitive and highly extensible template project applying machine learning to making stock predictions.

GroundSeg Clustering Optimized Kdtree

A simple guide to MLOps through ZenML and its various integrations.

This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing variance.

Machine Learning from Scratch

WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.

ML Kaggle Titanic Problem using LogisticRegrission

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

Toolkit for building machine learning models that generalize to unseen domains and are robust to privacy and other attacks.

Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

Factorization machines in python

Fit interpretable models. Explain blackbox machine learning.

The unified machine learning framework, enabling framework-agnostic functions, layers and libraries.