AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

Last update: Dec 27, 2022

Overview

AutoTabular

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models tabular data.

What's good in it?

It is using the RAPIDS as back-end support, gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.
It Supports many anomaly detection models: ,
It using meta learning to accelerate model selection and parameter tuning.
It is using many Deep Learning models for tabular data: Wide&Deep, DCN(Deep & Cross Network), FM, DeepFM, PNN ...
It is using many machine learning algorithms: Baseline, Linear, Random Forest, Extra Trees, LightGBM, Xgboost, CatBoost, and Nearest Neighbors.
It can compute Ensemble based on greedy algorithm from Caruana paper.
It can stack models to build level 2 ensemble (available in Compete mode or after setting stack_models parameter).
It can do features preprocessing, like: missing values imputation and converting categoricals. What is more, it can also handle target values preprocessing.
It can do advanced features engineering, like: Golden Features, Features Selection, Text and Time Transformations.
It can tune hyper-parameters with not-so-random-search algorithm (random-search over defined set of values) and hill climbing to fine-tune final models.

Example

First, install dependencies

# clone project
git clone https://apulis-gitlab.apulis.cn/apulis/AutoTabular/autotabular.git

# install project
cd autotabular
pip install -e .
pip install -r requirements.txt

Next, navigate to any file and run it.

# module folder
cd example

# run module (example: mnist as your main contribution)
python demo.py

Citation

If you use AutoTabular in a scientific publication, please cite the following paper:

Robin, et al. "AutoTabular: Robust and Accurate AutoML for Structured Data." arXiv preprint arXiv:2003.06505 (2021).

BibTeX entry:

@article{agtabular,
  title={AutoTabular: Robust and Accurate AutoML for Structured Data},
  author={JianZheng, WenQi},
  journal={arXiv preprint arXiv:2003.06505},
  year={2021}
}

License

This library is licensed under the Apache 2.0 License.

Contributing to AutoTabular

We are actively accepting code contributions to the AutoTabular project. If you are interested in contributing to AutoTabular, please contact me.

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

Related tags

Overview

AutoTabular

What's good in it?

Example

Citation

License

Contributing to AutoTabular

Owner

Robin

SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow

Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

Extreme Learning Machine implementation in Python

Uplift modeling and causal inference with machine learning algorithms

A python fast implementation of the famous SVD algorithm popularized by Simon Funk during Netflix Prize

A model to predict steering torque fully end-to-end

Machine Learning e Data Science com Python

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Machine learning algorithms implementation

Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

Ml based project which uses regression technique to predict the price.

Toolss - Automatic installer of hacking tools (ONLY FOR TERMUKS!)

Machine Learning University: Accelerated Natural Language Processing Class

A Time Series Library for Apache Spark

A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

High performance implementation of Extreme Learning Machines (fast randomized neural networks).

Confidence intervals for scikit-learn forest algorithms

A collection of neat and practical data science and machine learning projects