Decision Tree Regression algorithm implemented on Python from scratch.

Overview

Decision_Tree_Regression

I implemented the decision tree regression algorithm on Python. Unlike regular linear regression, this algorithm is used when the dataset is a curved line. The algorithm uses decision trees to generate multiple regression lines recursively. The training dataset is split into two parts in each iteration and a regression line is fit. The split is made at the best possible point to minimize the Mean Squared Error (MSE).

The number of regression lines is key. Overfitting occurs if the number is too high and underfitting occurs if the number is too low. There are two hyperparameters we use in this algorithm, maximum depth of the decision trees and the minimum number of samples in a single split. These parameters should be tested and optimized for each dataset.

Creating Datasets

Instead of using datasets downloaded from the internet, I decided to create my own datasets for this project. I generated 4 datasets to test my algorithm: Noisy Sinusoidal Signal, Noisy Second Degree Polynomial, Noisy Linear Line and Noisy Upside Down Triangle Signal. The program generates these datasets when its run and saves the datasets to recreate the results. To generate new datasets, you simply need to delete the first dataset, dataset0.csv file. You can also use your own datasets by uploading them to the same directory as the Python project.

Plotting Results

You can see the results of the sinusoidal signal and the upside down triangle for various hyperparameters. Colored points represent the splits in the training dataset, black lines represent the linear regression line for the corresponding split and the larger gray points represent the test dataset.

Figure_1

Figure_1

Figure_1

Figure_1

Figure_1

Figure_1



Figure_1

Figure_1

Figure_1

Figure_1

Figure_1

Figure_1


It is observed that for these datasets the best value for maximum depth is 4.

BentoML is a flexible, high-performance framework for serving, managing, and deploying machine learning models.

Model Serving Made Easy BentoML is a flexible, high-performance framework for serving, managing, and deploying machine learning models. Supports multi

BentoML 4.4k Jan 04, 2023
Nixtla is an open-source time series forecasting library.

Nixtla Nixtla is an open-source time series forecasting library. We are helping data scientists and developers to have access to open source state-of-

Nixtla 401 Jan 08, 2023
Firebase + Cloudrun + Machine learning

A simple end to end consumer lending decision engine powered by Google Cloud Platform (firebase hosting and cloudrun)

Emmanuel Ogunwede 8 Aug 16, 2022
Flask app to predict daily radiation from the time series of Solcast from Islamabad, Pakistan

Solar-radiation-ISB-MLOps - Flask app to predict daily radiation from the time series of Solcast from Islamabad, Pakistan.

Abid Ali Awan 1 Dec 31, 2021
PLUR is a collection of source code datasets suitable for graph-based machine learning.

PLUR (Programming-Language Understanding and Repair) is a collection of source code datasets suitable for graph-based machine learning. We provide scripts for downloading, processing, and loading the

Google Research 76 Nov 25, 2022
mlpack: a scalable C++ machine learning library --

a fast, flexible machine learning library Home | Documentation | Doxygen | Community | Help | IRC Chat Download: current stable version (3.4.2) mlpack

mlpack 4.2k Jan 01, 2023
Book Item Based Collaborative Filtering

Book-Item-Based-Collaborative-Filtering Collaborative filtering methods are used

Şebnem 3 Jan 06, 2022
A toolbox to iNNvestigate neural networks' predictions!

iNNvestigate neural networks! Table of contents Introduction Installation Usage and Examples More documentation Contributing Releases Introduction In

Maximilian Alber 1.1k Jan 05, 2023
Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Federal University of Rio Grande do Norte Technology Center Department of Computer Engineering and Automation Machine Learning Based Systems Design Re

Ivanovitch Silva 81 Oct 18, 2022
BigDL: Distributed Deep Learning Framework for Apache Spark

BigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w

4.1k Jan 09, 2023
Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

James Ritchie 204 Nov 18, 2022
A complete guide to start and improve in machine learning (ML)

A complete guide to start and improve in machine learning (ML), artificial intelligence (AI) in 2021 without ANY background in the field and stay up-to-date with the latest news and state-of-the-art

Louis-François Bouchard 3.3k Jan 04, 2023
Toolkit for building machine learning models that generalize to unseen domains and are robust to privacy and other attacks.

Toolkit for Building Robust ML models that generalize to unseen domains (RobustDG) Divyat Mahajan, Shruti Tople, Amit Sharma Privacy & Causal Learning

Microsoft 149 Jan 06, 2023
This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing variance.

minvar_invest_portfolio This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing var

1 Jan 06, 2022
Distributed scikit-learn meta-estimators in PySpark

sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn

Ibotta 282 Dec 09, 2022
It is a forest of random projection trees

rpforest rpforest is a Python library for approximate nearest neighbours search: finding points in a high-dimensional space that are close to a given

Lyst 211 Dec 29, 2022
The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

shibuiwilliam 9 Sep 09, 2022
Predict the output which should give a fair idea about the chances of admission for a student for a particular university

Predict the output which should give a fair idea about the chances of admission for a student for a particular university.

ArvindSandhu 1 Jan 11, 2022
Code Repository for Machine Learning with PyTorch and Scikit-Learn

Code Repository for Machine Learning with PyTorch and Scikit-Learn

Sebastian Raschka 1.4k Jan 03, 2023