Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

Last update: May 15, 2022

Overview

Amplo - AutoML (for Machine Data)

Welcome to the Automated Machine Learning package Amplo. Amplo's AutoML is designed specifically for machine data and works very well with tabular time series data (especially unbalanced classification!).

Though this is a standalone Python package, Amplo's AutoML is also available on Amplo's Smart Maintenance Platform. With a graphical user interface and various data connectors, it is the ideal place for service engineers to get started on Predictive.

Amplo's AutoML Pipeline contains the entire Machine Learning development cycle, including exploratory data analysis, data cleaning, feature extraction, feature selection, model selection, hyper parameter optimization, stacking, version control, production-ready models and documentation. It comes with additional tools such as interval analysers, drift detectors, data quality checks, etc.

Downloading Amplo

The easiest way is to install our Python package through PyPi:

pip install Amplo

2. Usage

Usage is very simple with Amplo's AutoML Pipeline.

from Amplo import Pipeline
from sklearn.datasets import make_classification
from sklearn.datasets import make_regression


x, y = make_classification()
pipeline = Pipeline()
pipeline.fit(x, y)
yp = pipeline.predict_proba(x)

x, y = make_regression()
pipeline = Pipeline()
pipeline.fit(x, y)
yp = pipeline.predict(x)

3. Amplo AutoML Features

Interval Analyser

from Amplo.AutoML import IntervalAnalyser

Interval Analyser for Log file classification. When log files have to be classified, and there is not enough data for time series methods (such as LSTMs, ROCKET or Weasel, Boss, etc), one needs to fall back to classical machine learning models which work better with lower samples. This raises the problem of which samples to classify. You shouldn't just simply classify on every sample and accumulate, that may greatly disrupt classification performance. Therefore, we introduce this interval analyser. By using an approximate K-Nearest Neighbors algorithm, one can estimate the strength of correlation for every sample inside a log. Using this allows for better interval selection for classical machine learning models.

To use this interval analyser, make sure that your logs are located in a folder of their class, with one parent folder with all classes, e.g.:

+-- Parent Folder
|   +-- Class_1
|       +-- Log_1.*
|       +-- Log_2.*
|   +-- Class_2
|       +-- Log_3.*

Exploratory Data Analysis

from Amplo.AutoML import DataExplorer

Automated Exploratory Data Analysis. Covers binary classification and regression. It generates:

Missing Values Plot
Line Plots of all features
Box plots of all features
Co-linearity Plot
SHAP Values
Random Forest Feature Importance
Predictive Power Score

Additional plots for Regression:

Seasonality Plots
Differentiated Variance Plot
Auto Correlation Function Plot
Partial Auto Correlation Function Plot
Cross Correlation Function Plot
Scatter Plots

Data Processing

from Amplo.AutoML import DataProcesser

Automated Data Cleaning:

Infers & converts data types (integer, floats, categorical, datetime)
Reformats column names
Removes duplicates columns and rows
Handles missing values by:
- Removing columns
- Removing rows
- Interpolating
- Filling with zero's
Removes outliers using:
- Clipping
- Z-score
- Quantiles
Removes constant columns

Data Sampler

from Amplo.AutoML import DataSampler

This pipeline is designed to handle unbalanced classification problems. Aside weighted loss functions, under sampling the majority class or down sampling the minority class helps. Various algorithms are analysed:

SMOTE
Borderline SMOTE
Random Over Sampler
Tomek Links
One Sided Selection
Random Under Sampler
Edited Nearest Neighbours
SMOTE Tomek
SMOTE Edited Nearest Neighbours

Feature Processing

from Amplo.AutoML import FeatureProcesser

Automatically extracts and selects features. Removes Co-Linear Features. Included Feature Extraction algorithms:

Multiplicative Features
Dividing Features
Additive Features
Subtractive Features
Trigonometric Features
K-Means Features
Lagged Features
Differencing Features
Inverse Features
Datetime Features

Included Feature Selection algorithms:

Random Forest Feature Importance (Threshold and Increment)
Predictive Power Score

Sequencing

from Amplo.AutoML import Sequencer

For time series regression problems, it is often useful to include multiple previous samples instead of just the latest. This class sequences the data, based on which time steps you want included in the in- and output. This is also very useful when working with tensors, as a tensor can be returned which directly fits into a Recurrent Neural Network.

Modelling

from Amplo.AutoML import Modeller

Runs various regression or classification models. Includes:

Scikit's Linear Model
Scikit's Random Forest
Scikit's Bagging
Scikit's GradientBoosting
Scikit's HistGradientBoosting
DMLC's XGBoost
Catboost's Catboost
Microsoft's LightGBM
Stacking Models

Grid Search

from Amplo.GridSearch import *

Contains three hyper parameter optimizers with extended predefined model parameters:

Grid Search
Halving Random Search
Optuna's Tree-Parzen-Estimator

Automatic Documntation

from Amplo.AutoML import Documenter

Contains a documenter for classification (binary and multiclass problems), as well as for regression. Creates a pdf report for a Pipeline, including metrics, data processing steps, and everything else to recreate the result.

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

25 Dec 28, 2022

A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

11.6k Jan 2, 2023

The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

9 Sep 9, 2022

fMRIprep Pipeline To Machine Learning

fMRIprep Pipeline To Machine Learning(Demo) 所有配置均在config.py文件下定义前置环境(lilab) 各个节点均安装docker，并有fmripre的镜像可以使用conda中的base环境（相应的第三份包之后更新） 1. fmriprep scr

3 Mar 8, 2022

This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

Zillow-Houses This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform. Pipeline is consists of 10

2 Jan 9, 2022

MachineLearningStocks is designed to be an intuitive and highly extensible template project applying machine learning to making stock predictions.

Using python and scikit-learn to make stock predictions

1.3k Jan 3, 2023

Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

Related tags

Overview

Amplo - AutoML (for Machine Data)

Downloading Amplo

2. Usage

3. Amplo AutoML Features

Interval Analyser

Exploratory Data Analysis

Data Processing

Data Sampler

Feature Processing

Sequencing

Modelling

Grid Search

Automatic Documntation

You might also like...

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

A toolkit for making real world machine learning and data analysis applications in C++

The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

fMRIprep Pipeline To Machine Learning

This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

MachineLearningStocks is designed to be an intuitive and highly extensible template project applying machine learning to making stock predictions.

TorchDrug is a PyTorch-based machine learning toolbox designed for drug discovery

Automated Machine Learning with scikit-learn

MLBox is a powerful Automated Machine Learning python library.

Releases(v0.10.2)

v0.10.2(Jun 2, 2022)

v0.10.1(May 26, 2022)

v0.9.0(May 2, 2022)

v0.8.27(Apr 6, 2022)

v0.8.26(Apr 1, 2022)

v0.8.25(Apr 1, 2022)

v0.8.24(Mar 31, 2022)

v0.8.23(Mar 30, 2022)

v0.8.22(Mar 24, 2022)

v0.8.21(Mar 23, 2022)

v0.8.20(Mar 22, 2022)

v0.8.19(Mar 4, 2022)

v0.8.18(Mar 3, 2022)

v0.8.17(Mar 3, 2022)

v0.8.16(Mar 2, 2022)

v0.8.15(Feb 2, 2022)

v0.8.14(Jan 27, 2022)

v0.8.13(Jan 25, 2022)

v0.8.12(Jan 3, 2022)

v0.8.11(Dec 24, 2021)

v0.8.10(Dec 23, 2021)

v0.8.9(Dec 23, 2021)

v0.8.8(Dec 23, 2021)

v0.8.7(Dec 23, 2021)

v0.8.6(Dec 23, 2021)

v0.8.5(Dec 21, 2021)

v0.8.4(Dec 21, 2021)

v0.8.3(Dec 21, 2021)

v0.8.2(Dec 21, 2021)

v0.8.1(Dec 21, 2021)

Owner

Amplo

Optimal Randomized Canonical Correlation Analysis

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale.

Formulae is a Python library that implements Wilkinson's formulas for mixed-effects models.

Model factory is a ML training platform to help engineers to build ML models at scale

Stacked Generalization (Ensemble Learning)

PyTorch extensions for high performance and large scale training.

Gaussian Process Optimization using GPy

A Python toolkit for rule-based/unsupervised anomaly detection in time series

虚拟货币(BTC、ETH)炒币量化系统项目。在一版本的基础上加入了趋势判断

A repository for collating all the resources such as articles, blogs, papers, and books related to Bayesian Statistics.

Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning

Official code for HH-VAEM

Kaggle Competition using 15 numerical predictors to predict a continuous outcome.

Transform ML models into a native code with zero dependencies

slim-python is a package to learn customized scoring systems for decision-making problems.

Fundamentals of Machine Learning

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

GRaNDPapA: Generator of Rad Names from Decent Paper Acronyms

Machine Learning e Data Science com Python

Dragonfly is an open source python library for scalable Bayesian optimisation.