AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

Last update: Dec 27, 2022

Overview

AutoTabular

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models tabular data.

What's good in it?

It is using the RAPIDS as back-end support, gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.
It Supports many anomaly detection models: ,
It using meta learning to accelerate model selection and parameter tuning.
It is using many Deep Learning models for tabular data: Wide&Deep, DCN(Deep & Cross Network), FM, DeepFM, PNN ...
It is using many machine learning algorithms: Baseline, Linear, Random Forest, Extra Trees, LightGBM, Xgboost, CatBoost, and Nearest Neighbors.
It can compute Ensemble based on greedy algorithm from Caruana paper.
It can stack models to build level 2 ensemble (available in Compete mode or after setting stack_models parameter).
It can do features preprocessing, like: missing values imputation and converting categoricals. What is more, it can also handle target values preprocessing.
It can do advanced features engineering, like: Golden Features, Features Selection, Text and Time Transformations.
It can tune hyper-parameters with not-so-random-search algorithm (random-search over defined set of values) and hill climbing to fine-tune final models.

Example

First, install dependencies

# clone project
git clone https://apulis-gitlab.apulis.cn/apulis/AutoTabular/autotabular.git

# install project
cd autotabular
pip install -e .
pip install -r requirements.txt

Next, navigate to any file and run it.

# module folder
cd example

# run module (example: mnist as your main contribution)
python demo.py

Citation

If you use AutoTabular in a scientific publication, please cite the following paper:

Robin, et al. "AutoTabular: Robust and Accurate AutoML for Structured Data." arXiv preprint arXiv:2003.06505 (2021).

BibTeX entry:

@article{agtabular,
  title={AutoTabular: Robust and Accurate AutoML for Structured Data},
  author={JianZheng, WenQi},
  journal={arXiv preprint arXiv:2003.06505},
  year={2021}
}

License

This library is licensed under the Apache 2.0 License.

Contributing to AutoTabular

We are actively accepting code contributions to the AutoTabular project. If you are interested in contributing to AutoTabular, please contact me.

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

Related tags

Overview

AutoTabular

What's good in it?

Example

Citation

License

Contributing to AutoTabular

Owner

Robin

The Simpsons and Machine Learning: What makes an Episode Great?

A collection of Scikit-Learn compatible time series transformers and tools.

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Lseng-iseng eksplor Machine Learning dengan menggunakan library Scikit-Learn

Open-Source CI/CD platform for ML teams. Deliver ML products, better & faster. ⚡️🧑‍🔧

High performance Python GLMs with all the features!

机器学习检测webshell

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

Model factory is a ML training platform to help engineers to build ML models at scale

Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.

ml4ir: Machine Learning for Information Retrieval

虚拟货币(BTC、ETH)炒币量化系统项目。在一版本的基础上加入了趋势判断

nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices

A flexible CTF contest platform for coming PKU GeekGame events

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

PyTorch extensions for high performance and large scale training.

Machine Learning for RC Cars

Python module for performing linear regression for data with measurement errors and intrinsic scatter

PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

Spark development environment for k8s