AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

Overview

AutoTabular

Paper Conference Conference Conference

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models tabular data.

autotabular

What's good in it?

  • It is using the RAPIDS as back-end support, gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.
  • It Supports many anomaly detection models: ,
  • It using meta learning to accelerate model selection and parameter tuning.
  • It is using many Deep Learning models for tabular data: Wide&Deep, DCN(Deep & Cross Network), FM, DeepFM, PNN ...
  • It is using many machine learning algorithms: Baseline, Linear, Random Forest, Extra Trees, LightGBM, Xgboost, CatBoost, and Nearest Neighbors.
  • It can compute Ensemble based on greedy algorithm from Caruana paper.
  • It can stack models to build level 2 ensemble (available in Compete mode or after setting stack_models parameter).
  • It can do features preprocessing, like: missing values imputation and converting categoricals. What is more, it can also handle target values preprocessing.
  • It can do advanced features engineering, like: Golden Features, Features Selection, Text and Time Transformations.
  • It can tune hyper-parameters with not-so-random-search algorithm (random-search over defined set of values) and hill climbing to fine-tune final models.

Example

First, install dependencies

# clone project
git clone https://apulis-gitlab.apulis.cn/apulis/AutoTabular/autotabular.git

# install project
cd autotabular
pip install -e .
pip install -r requirements.txt

Next, navigate to any file and run it.

# module folder
cd example

# run module (example: mnist as your main contribution)
python demo.py

Citation

If you use AutoTabular in a scientific publication, please cite the following paper:

Robin, et al. "AutoTabular: Robust and Accurate AutoML for Structured Data." arXiv preprint arXiv:2003.06505 (2021).

BibTeX entry:

@article{agtabular,
  title={AutoTabular: Robust and Accurate AutoML for Structured Data},
  author={JianZheng, WenQi},
  journal={arXiv preprint arXiv:2003.06505},
  year={2021}
}

License

This library is licensed under the Apache 2.0 License.

Contributing to AutoTabular

We are actively accepting code contributions to the AutoTabular project. If you are interested in contributing to AutoTabular, please contact me.

Owner
Robin
Machine learning Statistics
Robin
PySurvival is an open source python package for Survival Analysis modeling

PySurvival What is Pysurvival ? PySurvival is an open source python package for Survival Analysis modeling - the modeling concept used to analyze or p

Square 265 Dec 27, 2022
easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

Neuron AI 5 Jun 18, 2022
A simple and lightweight genetic algorithm for optimization of any machine learning model

geneticml This package contains a simple and lightweight genetic algorithm for optimization of any machine learning model. Installation Use pip to ins

Allan Barcelos 8 Aug 10, 2022
Gaussian Process Optimization using GPy

End of maintenance for GPyOpt Dear GPyOpt community! We would like to acknowledge the obvious. The core team of GPyOpt has moved on, and over the past

Sheffield Machine Learning Software 847 Dec 19, 2022
A collection of Machine Learning Models To Web Api which are built on open source technologies/frameworks like Django, Flask.

Author Ibrahim Koné From-Machine-Learning-Models-To-WebAPI A collection of Machine Learning Models To Web Api which are built on open source technolog

Ibrahim Koné 2 May 24, 2022
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

Seldon Core: Blazing Fast, Industry-Ready ML An open source platform to deploy your machine learning models on Kubernetes at massive scale. Overview S

Seldon 3.5k Jan 01, 2023
Titanic Traveller Survivability Prediction

The aim of the mini project is predict whether or not a passenger survived based on attributes such as their age, sex, passenger class, where they embarked and more.

John Phillip 0 Jan 20, 2022
A game theoretic approach to explain the output of any machine learning model.

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allo

Scott Lundberg 18.2k Jan 02, 2023
Deploy AutoML as a service using Flask

AutoML Service Deploy automated machine learning (AutoML) as a service using Flask, for both pipeline training and pipeline serving. The framework imp

Chris Rawles 221 Nov 04, 2022
A Lightweight Hyperparameter Optimization Tool 🚀

The mle-hyperopt package provides a simple and intuitive API for hyperparameter optimization of your Machine Learning Experiment (MLE) pipeline.

Robert Lange 137 Dec 02, 2022
Python package for stacking (machine learning technique)

vecstack Python package for stacking (stacked generalization) featuring lightweight functional API and fully compatible scikit-learn API Convenient wa

Igor Ivanov 671 Dec 25, 2022
AutoX是一个高效的自动化机器学习工具,它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色、简单易用、通用、自动化、灵活。

English | 简体中文 AutoX是什么? AutoX一个高效的自动化机器学习工具,它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色: AutoX在多个kaggle数据集上,效果显著优于其他解决方案(见效果对比)。 简单易用: AutoX的接口和sklearn类似,方便上手使用。

4Paradigm 431 Dec 28, 2022
Transform ML models into a native code with zero dependencies

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Bayes' Witnesses 2.3k Jan 03, 2023
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 363 Dec 14, 2022
LibTraffic is a unified, flexible and comprehensive traffic prediction library based on PyTorch

LibTraffic is a unified, flexible and comprehensive traffic prediction library, which provides researchers with a credibly experimental tool and a convenient development framework. Our library is imp

432 Jan 05, 2023
Land Cover Classification Random Forest

You can perform Land Cover Classification on Satellite Images using Random Forest and visualize the result using Earthpy package. Make sure to install the required packages and such as

Dr. Sander Ali Khowaja 1 Jan 21, 2022
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 03, 2023
Class-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible

IMBENS: Class-imbalanced Ensemble Learning in Python Language: English | Chinese/中文 Links: Documentation | Gallery | PyPI | Changelog | Source | Downl

Zhining Liu 176 Jan 04, 2023
Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Penguins Classification App Penguins species predictor app is used to classify penguins species using their island, sex, bill length (mm), bill depth

Siva Prakash 3 Apr 05, 2022
Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Fully Adversarial Mosaics (FAMOS) Pytorch implementation of the paper "Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Imag

Zalando Research 120 Dec 24, 2022