Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Science tasks.
📚 Explore the docs »

🐞 Report Bug · 🆕 Request Feature

About The Project
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact

About the project

Disclaimer

This library is an open-source research project and is not part of any official SAP products.

What's this?

This is a simple but accurate Automated Machine Learning library. Based on SAP HANA powerful in-memory algorithms, it provides high accuracy in multiple machine learning tasks. Our library also uses numerous data preprocessing functions to automate routine data cleaning tasks. So, hana_automl goes through all AutoML steps and makes Data Science work easier.

What is SAP HANA?

From www.sap.com: SAP HANA is a high-performance in-memory database that speeds data-driven, real-time decisions and actions.

Web app

https://share.streamlit.io/dan0nchik/sap-hana-automl/main/web.py

Documentation

https://sap-hana-automl.readthedocs.io/en/latest/index.html

Benchmarks

https://github.com/dan0nchik/SAP-HANA-AutoML/blob/main/comparison_openml.ipynb

ML tasks:

Binary classification
Regression
Multiclass classification
Forecasting

Steps automated:

👇 By the end of summer 2021, blue part will be fully automated by our library

Clients

GUI (Streamlit app)
Python library
CLI (coming soon)

Streamlit client

Built With

Getting Started

To get a package up and running, follow these simple steps.

Prerequisites

Make sure you have the following:

✅ Setup SAP HANA (skip this step if you have an instance with PAL enabled). There are 2 ways to do that.
In HANA Cloud:
- Create a free trial account
- Setup an instance
- Enable PAL - Predictive Analysis Library. It is vital to enable it because we use their algorithms.
In Virtual Machine:
- Rent a virtual machine in Azure, AWS, Google Cloud, etc.
- Install HANA instance there or on your PC (if you have >32 Gb RAM).
- Enable PAL - Predictive Analysis Library. It is vital to enable it because we use their algorithms.
✅ Installed software

Python > 3.6
Skip this step if python --version returns > 3.6
Cython
```
pip3 install Cython
```

Installation

There are 2 ways to install the library

Stable: from pypi
```
pip3 install hana_automl
```
Latest: from the repository
```
pip3 install https://github.com/dan0nchik/SAP-HANA-AutoML/archive/dev.zip
```
Note: latest version may contain bugs, be careful!

After installation

Check that PAL (Predictive Analysis Library) is installed and roles are granted

Read docs section about that.

If you don't want to read docs, run this code

from hana_automl.utils.scripts import setup_user
from hana_ml.dataframe import ConnectionContext

cc = ConnectionContext(address='address', user='user', password='password', port=39015)

# replace with credentials of user that will be created or granted a role to run PAL.
setup_user(connection_context=cc, username='user', password="password")

Usage

From code

Our library in a few lines of code

Connect to database.

from hana_ml.dataframe import ConnectionContext

cc = ConnectionContext(address='address',
                     user='username',
                     password='password',
                     port=1234)

Create AutoML model and fit it.

from hana_automl.automl import AutoML

model = AutoML(cc)
model.fit(
  file_path='path to training dataset', # it may be HANA table/view, or pandas DataFrame
  steps=10, # number of iterations
  target='target', # column to predict
  time_limit=120 # time limit in seconds
)

Predict.

model.predict(
file_path='path to test dataset',
id_column='ID',
verbose=1
)

For more examples, please refer to the Documentation

How to run Streamlit client

Clone repository: git clone https://github.com/dan0nchik/SAP-HANA-AutoML.git
Install dependencies: pip3 install -r requirements.txt
Run GUI: streamlit run ./web.py

Roadmap

See the open issues for a list of proposed features (and known issues). Feel free to report any bugs :)

Contributing

Any contributions you make are greatly appreciated 👏 !

Fork the Project
Create your Feature Branch (git checkout -b feature/NewFeature)

Install dependencies

pip3 install Cython

pip3 install -r requirements.txt

Create credentials.py file in tests directory Your files should look like this:

SAP-HANA-AutoML
│   README.md
│   all other files   
│   .....
|
└───tests
    │   test files...
    │   credentials.py

Copy and paste this piece of code there and replace it with your credentials:

host = "host"
user = "username"
password = "password"
port = 39015 # or any port you need
schema = "your schema"

Don't worry, this file is in .gitignore, so your credentials won't be seen by anyone.

Make some changes
Write tests that cover your code in tests directory
Run tests (under SAP-HANA-AutoML directory)
```
pytest
```
Commit your changes (git commit -m 'Add some amazing features')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.
Don't really understand license? Check out the MIT license summary.

Contact

Authors: @While-true-codeanything, @DbusAI, @dan0nchik

Project Link: https://github.com/dan0nchik/SAP-HANA-AutoML

Python Automated Machine Learning library for tabular data.

Related tags

Overview

Table of Contents

About the project

Disclaimer

What's this?

What is SAP HANA?

Web app

Documentation

Benchmarks

ML tasks:

Steps automated:

Clients

Built With

Getting Started

Prerequisites

Installation

After installation

Usage

From code

How to run Streamlit client

Roadmap

Contributing

License

Contact

Owner

Daniel Khromov

A benchmark of data-centric tasks from across the machine learning lifecycle.

Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Azure MLOps (v2) solution accelerators.

Can a machine learning project be implemented to estimate the salaries of baseball players whose salary information and career statistics for 1986 are shared?

Adaptive: parallel active learning of mathematical functions

Scikit learn library models to account for data and concept drift.

Simple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing

﻿Greykite: A flexible, intuitive and fast forecasting library

LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading

WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

Iris-Heroku - Putting a Machine Learning Model into Production with Flask and Heroku

OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems.

K-Means clusternig example with Python and Scikit-learn

A high performance and generic framework for distributed DNN training

Confidence intervals for scikit-learn forest algorithms

Greykite: A flexible, intuitive and fast forecasting library