Stochastic Gradient Trees implementation in Python

Last update: Nov 18, 2022

Overview

Stochastic Gradient Trees - Python

Stochastic Gradient Trees¹ by Henry Gouk, Bernhard Pfahringer, and Eibe Frank implementation in Python. Based on the parer's accompanied repository code.

Python Version 3.7 or later

Used Python libraries:

numpy>=1.20.2
scipy>=1.6.2
pandas>=1.3.3
scikit-learn>=0.24.2

Usage:

    from StochasticGradientTree import StochasticGradientTreeClassifier

    from sklearn.model_selection import train_test_split
    from sklearn.datasets import load_breast_cancer
    from sklearn.metrics import confusion_matrix, accuracy_score, log_loss

    def train(X, y):

        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.34)
        
        tree = StochasticGradientTreeClassifier()

        tree.fit(X_train, y_train)
    
        y_pred = tree.predict(X_test)

        proba = tree.predict_proba(X_test)        

        acc_test = accuracy_score(y_test, y_pred)
        print(confusion_matrix(y_test, y_pred))
        print('Acc test: ', acc_test)
        print('Cross entropy loss: ', log_loss(y_test, proba))

        return tree, acc_test

    if __name__ == "__main__":

        breast = load_breast_cancer(as_frame=True)

        X = breast.frame.copy()
        y = breast.frame.target
        
        X.drop(['target'], axis=1, inplace=True) 

        tree, _ = train(X, y)

Binary classification example:

python classification_breast.py

Multiclass classification (using the One-vs-the-rest multiclass strategy):

python classification_iris.py

Regression example:

python regression_diabetes.py

Gouk, H., Pfahringer, B., and Frank, E. Stochastic gradient trees. In Proceedings of The Eleventh Asian Conference on Machine Learning, volume 101 of Proceedings of Machine Learning Research, pp. 1094–1109. PMLR, 2019. ↩

Stochastic Gradient Trees implementation in Python

Related tags

Overview

Stochastic Gradient Trees - Python

Python Version 3.7 or later

Used Python libraries:

Usage:

Binary classification example:

Multiclass classification (using the One-vs-the-rest multiclass strategy):

Regression example:

Owner

John Koumentis

Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

Dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Binance Kline Data With Python

Retentioneering: product analytics, data-driven customer journey map optimization, marketing analytics, web analytics, transaction analytics, graph visualization, and behavioral segmentation with customer segments in Python.

Finding project directories in Python (data science) projects, just like there R rprojroot and here packages

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

Data pipelines built with polars

Building house price data pipelines with Apache Beam and Spark on GCP

Time ranges with python

A notebook to analyze Amazon Recommendation Review Dataset.

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

The Master's in Data Science Program run by the Faculty of Mathematics and Information Science

Exploratory data analysis

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences

A data parser for the internal syncing data format used by Fog of World.

Statistical package in Python based on Pandas

A probabilistic programming library for Bayesian deep learning, generative models, based on Tensorflow

Display the behaviour of a realtime program with a scope or logic analyser.

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.

Stochastic Gradient Trees implementation in Python

Related tags

Overview

Stochastic Gradient Trees - Python

Python Version 3.7 or later

Used Python libraries:

Usage:

Binary classification example:

Multiclass classification (using the One-vs-the-rest multiclass strategy):

Regression example:

Footnotes

Owner

John Koumentis

Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

Dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Binance Kline Data With Python

Retentioneering: product analytics, data-driven customer journey map optimization, marketing analytics, web analytics, transaction analytics, graph visualization, and behavioral segmentation with customer segments in Python.

Finding project directories in Python (data science) projects, just like there R rprojroot and here packages

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

Data pipelines built with polars

Building house price data pipelines with Apache Beam and Spark on GCP

Time ranges with python

A notebook to analyze Amazon Recommendation Review Dataset.

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

The Master's in Data Science Program run by the Faculty of Mathematics and Information Science

Exploratory data analysis

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences

A data parser for the internal syncing data format used by Fog of World.

Statistical package in Python based on Pandas

A probabilistic programming library for Bayesian deep learning, generative models, based on Tensorflow

Display the behaviour of a realtime program with a scope or logic analyser.

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.