Primitives for machine learning and data science.

Last update: Dec 29, 2022

Overview

An Open Source Project from the Data to AI Lab, at MIT

MLPrimitives

Pipelines and primitives for machine learning and data science.

Documentation: https://MLBazaar.github.io/MLPrimitives
Github: https://github.com/MLBazaar/MLPrimitives
License: MIT
Development Status: Pre-Alpha

Overview

This repository contains primitive annotations to be used by the MLBlocks library, as well as the necessary Python code to make some of them fully compatible with the MLBlocks API requirements.

There is also a collection of custom primitives contributed directly to this library, which either combine third party tools or implement new functionalities from scratch.

Why did we create this library?

Too many libraries in a fast growing field
Huge societal need to build machine learning apps
Domain expertise resides at several places (knowledge of math)
No documented information about hyperparameters, behavior...

Installation

Requirements

MLPrimitives has been developed and tested on Python 3.6, 3.7 and 3.8

Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering with other software installed in the system where MLPrimitives is run.

Install with pip

The easiest and recommended way to install MLPrimitives is using pip:

pip install mlprimitives

This will pull and install the latest stable release from PyPi.

If you want to install from source or contribute to the project please read the Contributing Guide.

Quickstart

This section is a short series of tutorials to help you getting started with MLPrimitives.

In the following steps you will learn how to load and run a primitive on some data.

Later on you will learn how to evaluate and improve the performance of a primitive by tuning its hyperparameters.

Running a Primitive

In this first tutorial, we will be executing a single primitive for data transformation.

1. Load a Primitive

The first step in order to run a primitive is to load it.

This will be done using the mlprimitives.load_primitive function, which will load the indicated primitive as an MLBlock Object from MLBlocks

In this case, we will load the mlprimitives.custom.feature_extraction.CategoricalEncoder primitive.

from mlprimitives import load_primitive

primitive = load_primitive('mlprimitives.custom.feature_extraction.CategoricalEncoder')

2. Load some data

The CategoricalEncoder is a transformation primitive which applies one-hot encoding to all the categorical columns of a pandas.DataFrame.

So, in order to be able to run our primitive, we will first load some data that contains categorical columns.

This can be done with the mlprimitives.datasets.load_census function:

from mlprimitives.datasets import load_census

dataset = load_census()

This dataset object has an attribute data which contains a table with several categorical columns.

We can have a look at this table by executing dataset.data.head(), which will return a table like this:

                             0                    1                   2
age                         39                   50                  38
workclass            State-gov     Self-emp-not-inc             Private
fnlwgt                   77516                83311              215646
education            Bachelors            Bachelors             HS-grad
education-num               13                   13                   9
marital-status   Never-married   Married-civ-spouse            Divorced
occupation        Adm-clerical      Exec-managerial   Handlers-cleaners
relationship     Not-in-family              Husband       Not-in-family
race                     White                White               White
sex                       Male                 Male                Male
capital-gain              2174                    0                   0
capital-loss                 0                    0                   0
hours-per-week              40                   13                  40
native-country   United-States        United-States       United-States

3. Fit the primitive

In order to run our pipeline, we first need to fit it.

This is the process where it analyzes the data to detect which columns are categorical

This is done by calling its fit method and assing the dataset.data as X.

primitive.fit(X=dataset.data)

4. Produce results

Once the pipeline is fit, we can process the data by calling the produce method of the primitive instance and passing agin the data as X.

transformed = primitive.produce(X=dataset.data)

After this is done, we can see how the transformed data contains the newly generated one-hot vectors:

                                                0      1       2       3       4
age                                            39     50      38      53      28
fnlwgt                                      77516  83311  215646  234721  338409
education-num                                  13     13       9       7      13
capital-gain                                 2174      0       0       0       0
capital-loss                                    0      0       0       0       0
hours-per-week                                 40     13      40      40      40
workclass= Private                              0      0       1       1       1
workclass= Self-emp-not-inc                     0      1       0       0       0
workclass= Local-gov                            0      0       0       0       0
workclass= ?                                    0      0       0       0       0
workclass= State-gov                            1      0       0       0       0
workclass= Self-emp-inc                         0      0       0       0       0
...                                             ...    ...     ...     ...     ...

Tuning a Primitive

In this short tutorial we will teach you how to evaluate the performance of a primitive and improve its performance by modifying its hyperparameters.

To do so, we will load a primitive that can learn from the transformed data that we just generated and later on make predictions based on new data.

1. Load another primitive

Firs of all, we will load the xgboost.XGBClassifier primitive that we will use afterwards.

primitive = load_primitive('xgboost.XGBClassifier')

2. Split the dataset

Before being able to evaluate the primitive perfomance, we need to split the data in two parts: train, which will be used for the primitive to learn, and test, which will be used to make the predictions that later on will be evaluated.

In order to do this, we will get the first 75% of rows from the transformed data that we obtained above and call it X_train, and then set the next 25% of rows as X_test.

train_size = int(len(transformed) * 0.75)
X_train = transformed.iloc[:train_size]
X_test = transformed.iloc[train_size:]

Similarly, we need to obtain the y_train and y_test variables containing the corresponding output values.

y_train = dataset.target[:train_size]
y_test = dataset.target[train_size:]

3. Fit the new primitive

Once we have have splitted the data, we can fit the primitive by passing X_train and y_train to its fit method.

primitive.fit(X=X_train, y=y_train)

4. Make predictions

Once the primitive has been fitted, we can produce predictions using the X_test data as input.

predictions = primitive.produce(X=X_test)

5. Evalute the performance

We can now evaluate how good the predictions from our primitive are by using the score method from the dataset object on both the expected output and the real output from the primitive:

dataset.score(y_test, predictions)

This will output a float value between 0 and 1 indicating how good the predicitons are, being 0 the worst score possible and 1 the best one.

In this case we will obtain a score around 0.866

6. Set new hyperparameter values

In order to improve the performance of our primitive we will try to modify a couple of its hyperparameters.

First we will see which hyperparameter values the primitive has by calling its get_hyperparameters method.

primitive.get_hyperparameters()

which will return a dictionary like this:

{
    "n_jobs": -1,
    "n_estimators": 100,
    "max_depth": 3,
    "learning_rate": 0.1,
    "gamma": 0,
    "min_child_weight": 1
}

Next, we will see which are the valid values for each one of those hyperparameters by calling its get_tunable_hyperparameters method:

primitive.get_tunable_hyperparameters()

For example, we will see that the max_depth hyperparameter has the following specification:

{
    "type": "int",
    "default": 3,
    "range": [
        3,
        10
    ]
}

Next, we will choose a valid value, for example 7, and set it into the pipeline using the set_hyperparameters method:

primitive.set_hyperparameters({'max_depth': 7})

7. Re-evaluate the performance

Once the new hyperparameter value has been set, we repeat the fit/train/score cycle to evaluate the performance of this new hyperparameter value:

primitive.fit(X=X_train, y=y_train)
predictions = primitive.produce(X=X_test)
dataset.score(y_test, predictions)

This time we should see that the performance has improved to a value around 0.724

What's Next?

Do you want to learn more about how the project, about how to contribute to it or browse the API Reference? Please check the corresponding sections of the documentation!

Comments

mlprimitives/candidates/timseries.py lacks ability to return sequences of desired length

I want to push all the final touches and changes that were made after MLPrimitives release v0.1.4 in order to make the LSTM pipeline primitives work as expected, since there are bugs that prevent them from being used now.

I will push the changes to the primitives that are currently in the working demo as tested on private servers and repos.
wontfix

opened by itinawi 7
Add DSP primitives
I'd like to add primitives for Digital Signal Processing (DSP). These primitives will be used to detect anomalies in the telemetry data received from different satellites as part of project Orion, but they can be used to detect anomalies in any system.

Filename: dsp.py Classname: freqAnalysis Methods:

fitFreqMax

fitFreqStdDev

produceFreqCompare

produceBPF

There are some more methods used internally in the class: windowDesign, nextPowerOf2

I also wrote the first JSON file: dsp.stdDevFreqAnalysis.json. This file points to the fitFreqStdDev method for data fitting and to the produceFreqCompare method for making predictions. Later I'll write more JSON files pointing to the other methods in the freqAnalysis class.

I also wrote two more classes for testing and scoring: Filename: dsp_utils.py Classname: SIGEN Methods: - sigen - anomaliesGenerator - noiseGenerator - signalGenerator - show Classname: EVAL Methods:
- score - evaluate
new primitives
opened by rjdiez 7
Typo in the primitive keras.Sequential.LSTMTimeSeriesRegressor
MLPrimitives version:0.1.10

Python version:N/A

Operating System:N/A

Description

There is a typo in line 74 of MLPrimitives/mlprimitives/jsons/keras.Sequential.LSTMTimeSeriesRegressor.json It says bastch_size and should say batch_size. Otherwise, the batch size of the training cannot be tuned

What I Did

Simply correct the typo and it works :)
bug approved
opened by DanielCalvoCerezo 4

Potential dependency conflicts between mlprimitives and numpy

Hi, as shown in the following full dependency graph of mlprimitives, mlprimitives requires numpy >=1.15.2,<1.17, mlprimitives requires statsmodels <1,>=0.9.0 (statsmodels 0.11.1 will be installed, i.e., the newest version satisfying the version constraint), and directed dependency statsmodels 0.11.1 transitively introduces numpy >=1.14.

Obviously, there are multiple version constraints set for numpy in this project. However, according to pip's “first found wins” installation strategy, numpy 1.16.6 (i.e., the newest version satisfying constraint >=1.15.2,<1.17) is the actually installed version.

Although the first found package version numpy 1.16.6 just satisfies the later dependency constraint （numpy >=1.15.2,<1.17), such installed version is very close to the upper bound of the version constraint of numpy specified by statsmodels 0.11.1.

Once statsmodels upgrades，its newest version will be installed, Therefore, it will easily cause a dependency conflict (build failure), if the upgraded statsmodels version introduces a higher version of numpy, violating its another version constraint >=1.15.2,<1.17.

According to the release history of statsmodels, it habitually upgrates Numpy in its recent releases. For instance, statsmodels 0.11.0rc1 upgrated Numpy’s constraint from >=1.11 to >=1.14, and statsmodels next version upgrated Numpy’s constraint from >=1.14 to >=1.15.

As such, it is a warm warning of a potential dependency conflict issue for mlprimitives.

Dependency tree

mlprimitives  - 0.2.4
| +- docutils(install version:0.15.2 version range:>=0.10,<0.16)
| +- featuretools(install version:0.11.0 version range:<0.12,>=0.6.1)
| | +- backports.tempfile(install version:1.0 version range:>=1.0)
| | +- click(install version:7.1.1 version range:>=7.0.0)
| | +- cloudpickle(install version:1.3.0 version range:>=0.4.0)
| | +- dask(install version:2.14.0 version range:>=1.1.0)
| | +- dask(install version:1.2.2 version range:<2.0.0)
| | +- distributed(install version:2.14.0 version range:>=1.24.2)
| | | +- click (install version:7.1.1 version range:>=6.6)
| | | +- cloudpickle (install version:1.3.0 version range:>=0.2.2)
| | | +- dask (install version:2.14.0 version range:>=2.9.0)
| | | +- msgpack (install version:1.0.0 version range:>=0.6.0)
| | | +- psutil (install version:5.7.0 version range:>=5.0)
| | | +- pyyaml(install version:5.3.1 version range:*)
| | | +- setuptools(install version:46.1.3 version range:*)
| | +- distributed(install version:1.28.1 version range:<2.0.0)
| | | +- click(install version:7.1.1 version range:>=6.6)
| | | +- cloudpickle(install version:1.3.0 version range:>=0.2.2)
| | | +- dask(install version:2.14.0 version range:>=0.18.0)
| | | +- futures(install version:3.3.0 version range:*)
| | | +- msgpack(install version:1.0.0 version range:*)
| | | +- psutil(install version:5.7.0 version range:>=5.0)
| | | +- pyyaml(install version:5.3.1 version range:*)
| | | +- singledispatch(install version:3.4.0.3 version range:*)
| | | | +- six(install version:1.14.0 version range:*)
| | | +- six(install version:1.14.0 version range:*)
| | +- funcsigs(install version:1.0.2 version range:>=1.0.2)
| | +- future(install version:0.18.2 version range:>=0.16.0)
| | +- numpy(install version:1.16.6 version range:>=1.13.3)
| | +- pandas(install version:0.24.2 version range:>=0.23.0)
| | +- pathlib(install version:1.0.1 version range:>=1.0.1)
| | +- psutil(install version:5.7.0 version range:>=5.4.8)
| | +- pyyaml(install version:5.3.1 version range:>=3.12)
| | +- s3fs(install version:0.4.2 version range:>=0.2.2)
| | | +- botocore(install version:1.15.39 version range:>=1.12.91)
| | | | +- docutils(install version:0.15.2 version range:>=0.10,<0.16)
| | | | +- jmespath(install version:0.10.0 version range:>=0.7.1,<1.0.0)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.1,<3.0.0)
| | | | +- urllib3(install version:1.25.9 version range:>=1.20,<1.26)
| | | +- fsspec(install version:0.7.2 version range:>=0.6.0)
| | +- scikit-learn(install version:0.20.4 version range:>=0.20.0)
| | +- scikit-learn(install version:0.20.4 version range:<0.21,>=0.20.0)
| | +- smart-open(install version:1.11.1 version range:>=1.8.4)
| | | +- boto(install version:2.49.0 version range:*)
| | | +- boto3(install version:1.12.39 version range:*)
| | | | +- botocore(install version:1.15.49 version range:>=1.15.39,<1.16.0)
| | | | +- jmespath(install version:0.10.0 version range:>=0.7.1,<1.0.0)
| | | | +- s3transfer(install version:0.3.3 version range:>=0.3.0,<0.4.0)
| | | +- requests(install version:2.23.0 version range:*)
| | | | +- certifi(install version:2020.4.5.1 version range:>=2017.4.17)
| | | | +- chardet(install version:3.0.4 version range:>=3.0.2,<4)
| | | | +- idna(install version:2.9 version range:>=2.5,<3)
| | | | +- urllib3(install version:1.25.9 version range:>=1.21.1,<1.26)
| | +- tqdm(install version:4.45.0 version range:>=4.32.0)
| +- iso639(install version:0.1.4 version range:<0.2,>=0.1.4)
| +- keras(install version:2.3.1 version range:<3,>=2.1.6)
| | +- h5py(install version:2.10.0 version range:*)
| | +- keras_applications(install version: version range:>=1.0.6)
| | +- keras_preprocessing(install version: version range:>=1.0.5)
| | +- numpy(install version:1.16.6 version range:>=1.9.1)
| | +- pyyaml(install version:5.3.1 version range:*)
| | +- scipy(install version:1.4.1 version range:>=0.14)
| | +- six(install version:1.14.0 version range:>=1.9.0)
| +- langdetect(install version:1.0.8 version range:<2,>=1.0.7)
| | +- six(install version:1.14.0 version range:*)
| +- lightfm(install version:1.15 version range:>=1.15,<2)
| | +- 1-15(install version: version range:*)
| | +- numpy(install version:1.16.6 version range:*)
| | +- requests(install version:2.23.0 version range:*)
| | | +- certifi(install version:2020.4.5.1 version range:>=2017.4.17)
| | | +- chardet(install version:3.0.4 version range:>=3.0.2,<4)
| | | +- idna(install version:2.9 version range:>=2.5,<3)
| | | +- urllib3(install version:1.25.9 version range:>=1.21.1,<1.26)
| | +- scipy(install version:1.4.1 version range:>=0.17.0)
| +- mlblocks(install version:0.3.4 version range:>=0.3.4,<0.4)
| +- networkx(install version:2.4 version range:>=2.0,<3)
| | +- decorator(install version:4.4.2 version range:>=4.3.0)
| +- nltk(install version:3.5 version range:>=3.3,<4)
| +- numpy(install version:1.16.6 version range:>=1.15.2,<1.17)
| +- opencv-python(install version:4.2.0.34 version range:<5,>=3.4.0.12)
| +- pandas(install version:0.24.2 version range:>=0.23.4,<0.25)
| +- python-louvain(install version:0.13 version range:>=0.10,<0.14)
| | +- networkx(install version:2.4 version range:*)
| | | +- decorator(install version:4.4.2 version range:>=4.3.0)
| +- scikit-image(install version:0.14.5 version range:<0.15,>=0.13.1)
| +- scikit-learn(install version:0.20.4 version range:>=0.20.0,<0.21)
| +- scipy(install version:1.4.1 version range:<2,>=1.1.0)
| +- setuptools(install version:46.1.3 version range:>=41.0.0)
| +- statsmodels(install version:0.11.1 version range:<1,>=0.9.0)
| | +- numpy(install version:1.16.6 version range:>=1.14)
| | +- pandas(install version:0.24.2 version range:>=0.21)
| | +- patsy(install version:0.5.1 version range:>=0.5)
| | | +- numpy(install version:1.16.6 version range:>=1.4)
| | | +- six(install version:1.14.0 version range:*)
| | +- scipy(install version:1.4.1 version range:>=1.0)
| +- tensorflow(install version:1.15.2 version range:<2,>=1.11.0)
| +- xgboost(install version:0.90 version range:<1,>=0.72.1)

Thanks for your help. Best, Neolith

opened by NeolithEra 3

Add LSTM-CycleGAN primitive for time series anomaly detection

resolve #200

Add prototype of LSTM-CycleGAN and corresponding error-calculation primitives.

This GAN architecture allows to encode and decode a time-series signal. For the error calculation a combination of the reconstruction error and the score from the critic network is used.

Since the model is still under development, the primitives are located in the candidates folder.

opened by AlexanderGeiger 3
Add new primitive: Arima model

Description

ARIMA models are often used to describe time series data. Therefore we should add an Arima primitive for time series forecasting. We can use the statsmodels library.

The primitive takes an array as input, on which an Arima model is fitted. The forecast method returns another array of predictions.

What I Did

I started implementing this primitive for testing purposes in the arima branch on my fork, which you can check out. Concretely, I included an adapter for statsmodels and a Primitive JSON file.

Any feedback on the primitive itself and the implementation would be highly appreciated.
new primitives approved

opened by AlexanderGeiger 3
Add Primitives for Error calculation, smoothing, and thresholding
I want to add primitives for dynamic error calculation, smoothing, and thresholding.

I will be adding the code for these functions to MLPrimitives as its own python file, to be called MLPrimitives/mlprimitives/dynamic_error_thresholding.py

The implementation is as per Section 3.2 of : https://arxiv.org/pdf/1802.04431.pdf

dynamic_error_thresholding.py software architecture:

It will be a Python class called ErrorThresholder which is initialized with

y_true (np array): array of test targets corresponding to true values to be predicted at end of each sequence

y_hat (np array): predicted test values for each timestep in y_test

smoothed (bool): If False, return unsmooothed errors (used for assessing quality of predictions) Some parameters that have default values:

BATCH_SIZE = 70 # number of values to evaluate in each batch

WINDOW_SIZE = 30 # number of trailing batches to use in error calculation

SMOOTHING_PERC = 0.05 # determines window size used in EWMA smoothing (percentage of total values for channel)

Methods:

get_errors(inputs y_true and y_hat from self, smoothed (boolean)), returns errors (list of errors) Calculate the difference between predicted telemetry values and actual values, then smooth residuals using ewma to encourage identification of sustained errors/anomalies.

process_errors(inputs: y_true and y_hat, smoothed_errors) returns sequence_anomalies (a list of anomalies detected from the errors) and anomaly_scores (score for each anomaly sequence).

other helper methods for these two functions, as needed

new primitives
opened by itinawi 3
Add sklearn.neighbors primitives
I would like to add some primitives from sklearn.neighbors

Primitives to add:

sklearn.neighbors.KNeighborsClassifier

sklearn.neighbors.KNeighborsClassifier_proba

sklearn.neighbors.KNeighborsRegressor

new primitives
opened by wsnalice 3
Issue 180 improve find anomalies primitive
Addresses the changes mentioned in #180 to improve the custom.timeseries_anomalies.find_anomalies primitive

Add optional threshold for unusually low errors

Add possibility to use overlapping thresholds

Once an anomalous region is found, increase the region by a predefined size to make the pruning afterwards more stable

Fix the calculation of anomaly score to work with consecutive and overlapping sequences
opened by AlexanderGeiger 2
Add anomaly threshold calculation for batches of errors

Description

The calculation of the threshold ε in the timeseries_anomalies.py primitive should happen in batches, i.e. the threshold should be calculated for each batch of errors separately. Currently we are using the whole array of errors to calculate one single threshold.

opened by AlexanderGeiger 2
Predict output probabilities using predict_proba
MLPrimitives version: 0.1.5

Python version: 3.7.1

Operating System: Amazon Linux (4.14.88-88.76.amzn2.x86_64)

Description

I have a use case where I need classification models to output probabilities instead of the predicted class.

For eg: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.predict_proba

To work around this, I had to create a custom block for each model and insert a new keyword for predict as shown below.

from sklearn.ensemble import RandomForestClassifier class RandomForestBlock(object): def __init__(self, **kwargs): self.kwargs = kwargs self.model = None def fit(self, X,y): self.model = RandomForestClassifier(**self.kwargs) self.model.fit(X,y) def predict(self, X, prob=False): if prob: return self.model.predict_proba(X)[:,1] else: return self.model.predict(X)

I was wondering if there is a better way to do this without writing custom code? This will probably be a pretty common use case.
question
opened by sarin1991 2
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 1
Upgrade ``featuretools``
MLPrimitives version: 0.3.2

Python version: 3.8

Operating System: macOS

Description

Featuretools released new stable versions, reaching v1.0.0, would like to update MLPrimitives to add it into range.
opened by sarahmish 0
Loss/Validation Plot Callback
MLPrimitives version: 0.2.5

Python version: 3.6.9

Operating System: Ubuntu 18.04.6

Description

Is it possible to plot training and validation losses during the .fit() call? I saw there is this pull request https://github.com/MLBazaar/MLPrimitives/pull/163. I did not see any examples of using a callback to plot losses during training.

Any other recommendation on how to visualize if models are converging during training would be most appreciated.

Thank you

Chris
opened by cjtaylo-csu 0
Allow build layer to recognize layers imported from tensorflow keras
MLPrimitives version: 0.3.3.dev0

Python version: 3.7.0

Operating System: macOS

Description

Allow ml.primitives.adapters.build_layer to also recognize layers imported from tensorflow.keras.

What I Did

Current Version:

if issubclass(layer_class, keras.layers.wrappers.Wrapper):

Suggested Changes:

if issubclass(layer_class, tf.keras.layers.Wrapper) or issubclass(layer_class, keras.layers.wrappers.Wrapper):
opened by lcwong0928 0

tensorflow `get_config` error

MLPrimitives version: 0.3.0
Python version: 3.6

Description

The current version of MLPrimitives will automatically install tensorflow 2.3.4.

This version will encounter the following issue:

/usr/local/lib/python3.6/site-packages/keras/backend.py in <module>
     34 from tensorflow.core.protobuf import config_pb2
     35 from tensorflow.python.eager import context
---> 36 from tensorflow.python.eager.context import get_config
     37 from tensorflow.python.framework import config
     38 from keras import backend_config

ImportError: cannot import name 'get_config'

Because of the piece of code from mlprimitives/adapters/keras.py


import logging
import tempfile

import keras        # this is the line causing error
import numpy as np

Solution

Simply replace

import keras

from tensorflow import keras

opened by dyuliu 0

Releases(v0.3.2)

v0.3.2(Nov 9, 2021)
Adapter Improvements

Inferring data shapes with single dimension for keras adapter - Issue #265 by @sarahmish

Source code(tar.gz)
Source code(zip)
v0.3.1(Oct 7, 2021)
Primitive Improvements

Dynamic target_shape in keras adapter - Issue #263 by @sarahmish

Save keras primitives in Windows environment - Issue #261 by @sarahmish

General Imporvements

Update TensorFlow and NumPy dependency - Issue #259 by @sarahmish

Source code(tar.gz)
Source code(zip)
v0.3.0(Jan 9, 2021)
New Primitives

Add primitive sklearn.naive_bayes.GaussianNB - Issue #242 by @sarahmish

Add primitive sklearn.linear_model.SGDClassifier - Issue #241 by @sarahmish

Primitive Improvements

Add offset to rolling_window_sequence primitive - Issue #251 by @skyeeiskowitz

Rename the time_index column to time - Issue #252 by @pvk-developer

Update featuretools dependency - Issue #250 by @pvk-developer

General Improvements

Udpate dependencies and add python3.8 - Issue #246 by @csala

Drop Python35 - Issue #244 by @csala

Source code(tar.gz)
Source code(zip)
v0.2.5(Jul 29, 2020)
Primitive Improvements

Accept timedelta window_size in cutoff_window_sequences - Issue #239 by @joanvaquer

Bug Fixes

ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via pip install tensorflow - Issue #237 by @joanvaquer

New Primitives

Add pandas.DataFrame.set_index primitive - Issue #222 by @JDTheRipperPC

Source code(tar.gz)
Source code(zip)
v0.2.4(Jan 30, 2020)
New Primitives

Add RangeScaler and RangeUnscaler primitives - Issue #232 by @csala

Primitive Improvements

Extract input_shape from X in keras.Sequential - Issue #223 by @csala

Bug Fixes

mlprimitives.custom.text.TextCleaner fails if text is empty - Issue #228 by @csala

Error when loading the reviews dataset - Issue #230 by @csala

Curate dependencies: specify an explicit prompt-toolkit version range - Issue #224 by @csala

Source code(tar.gz)
Source code(zip)
v0.2.3(Nov 14, 2019)

New Primitives

Add primitive to make window_sequences based on cutoff times - Issue #217 by @csala Create a keras LSTM based TimeSeriesClassifier primitive - Issue #218 by @csala Add pandas DataFrame primitives - Issue #214 by @csala Add featuretools.EntitySet.normalize_entity primitive - Issue #209 by @csala

Primitive Improvements

Make featuretools.EntitySet.entity_from_dataframe entityset arg optional - Issue #208 by @csala

Add text regression dataset - Issue #206 by @csala

Bug Fixes

pandas.DataFrame.resample crash when grouping by integer columns - Issue #211 by @csala
Source code(tar.gz)
Source code(zip)
v0.2.2(Oct 8, 2019)
New Primitives

Add primitives for GAN based time-series anomaly detection - Issue #200 by @AlexanderGeiger

Add numpy.reshape and numpy.ravel primitives - Issue #197 by @AlexanderGeiger

Add feature selection primitive based on Lasso - Issue #194 by @csala

Primitive Improvements

feature_extraction.CategoricalEncoder support dtype category - Issue #196 by @csala

Source code(tar.gz)
Source code(zip)
v0.2.1(Sep 9, 2019)
New Primitives

Timeseries Intervals to Mask Primitive - Issue #186 by @AlexanderGeiger

Add new primitive: Arima model - Issue #168 by @AlexanderGeiger

Primitive Improvements

Curate PCA primitive hyperparameters - Issue #190 by @AlexanderGeiger

Add option to drop rolling window sequences - Issue #186 by @AlexanderGeiger

Bug Fixes

scikit-image==0.14.3 crashes when installed on Mac - Issue #188 by @csala

Source code(tar.gz)
Source code(zip)
v0.2.0(Jul 11, 2019)
New Features

Publish the pipelines as an entry_point Issue #175 by @csala

Primitive Improvements

Improve pandas.DataFrame.resample primitive Issue #177 by @csala

Improve feature_extractor primitives Issue #183 by @csala

Improve find_anomalies primitive Issue #180 by @AlexanderGeiger

Bug Fixes

Typo in the primitive keras.Sequential.LSTMTimeSeriesRegressor Issue #176 by @DanielCalvoCerezo

Source code(tar.gz)
Source code(zip)
v0.1.10(May 23, 2019)
New Features

Add function to run primitives without a pipeline Issue #43 by @csala

New Pipelines

Add pipelines for all the MLBlocks examples Issue #162 by @csala

Primitive Improvements

Add Early Stopping to keras.Sequential.LSTMTimeSeriesRegressor primitive Issue #156 by @csala

Make FeatureExtractor primitives accept Numpy arrays Issue #165 by @csala

Add window size and pruning to the timeseries_anomalies.find_anomalies primitive Issue #160 by @csala

Source code(tar.gz)
Source code(zip)
v0.1.9(Apr 25, 2019)
New Features

Add a single table binary classification dataset Issue #141 by @csala

New Primitives

Add Multilayer Perceptron (MLP) primitive for binary classification Issue #140 by @Hector-hedb12

Add primitive for Sequence classification with LSTM Issue #150 by @Hector-hedb12

Add VGG-like convnet primitive Issue #149 by @Hector-hedb12

Add Multilayer Perceptron (MLP) primitive for multi-class softmax classification Issue #139 by @Hector-hedb12

Add primitive to count feature matrix columns Issue #146 by @csala

Primitive Improvements

Add additional fit and predict arguments to keras.Sequential Issue #161 by @csala

Add suport for keras.Sequential Callbacks Issue #159 by @csala

Add fixed hyperparam to control keras.Sequential verbosity Issue #143 by @csala

Source code(tar.gz)
Source code(zip)
v0.1.8(Apr 25, 2019)
New Primitives

mlprimitives.custom.timeseries_preprocessing.time_segments_average - Issue #137

New Features

Add target_index output in timseries_preprocessing.rolling_window_sequences - Issue #136

Source code(tar.gz)
Source code(zip)
v0.1.7(Mar 16, 2019)
General Improvements

Validate JSON format in make lint - Issue #133

Add demo datasets - Issue #131

Improve featuretools.dfs primitive - Issue #127

New Primitives

pandas.DataFrame.resample - Issue #123

pandas.DataFrame.unstack - Issue #124

featuretools.EntitySet.add_relationship - Issue #126

featuretools.EntitySet.entity_from_dataframe - Issue #126

Bug Fixes

Bug in timeseries_anomalies.py - Issue #119

Source code(tar.gz)
Source code(zip)
v0.1.6(Feb 28, 2019)
General Improvements

Add Contributing Documentation

Remove upper bound in pandas version given new release of featuretools v0.6.1

Improve LSTMTimeSeriesRegressor hyperparameters

New Primitives

mlprimitives.candidates.dsp.SpectralMask

mlprimitives.custom.timeseries_anomalies.find_anomalies

mlprimitives.custom.timeseries_anomalies.regression_errors

mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences

mlprimitives.custom.timeseries_preprocessing.time_segments_average

sklearn.linear_model.ElasticNet

sklearn.linear_model.Lars

sklearn.linear_model.Lasso

sklearn.linear_model.MultiTaskLasso

sklearn.linear_model.Ridge

Source code(tar.gz)
Source code(zip)
v0.1.5(Feb 12, 2019)
New Primitives

sklearn.impute.SimpleImputer

sklearn.preprocessing.MinMaxScaler

sklearn.preprocessing.MaxAbsScaler

sklearn.preprocessing.RobustScaler

sklearn.linear_model.LinearRegression

General Improvements

Separate curated from candidate primitives

Setup entry_points in setup.py to improve compaitibility with MLBlocks

Add a test-pipelines command to test all the existing pipelines

Clean sklearn example pipelines

Change the author entry to a contributors list

Change the name of mlblocks_primitives folder

Fix installation instructions

Bug Fixes

Fix LSTMTimeSeriesRegressor primitive. Issue #90

Fix timeseries primitives. Issue #91

Negative index anomalies in timeseries_errors. Issue #89

Keep pandas version below 0.24.0. Issue #87

Source code(tar.gz)
Source code(zip)
v0.1.4(Jan 4, 2019)
New Primitives

mlprimitives.timeseries primitives for timeseries data preprocessing

mlprimitives.timeseres_error primitives for timeseries anomaly detection

keras.Sequential.LSTMTimeSeriesRegressor

sklearn.neighbors.KNeighbors Classifier and Regressor

several sklearn.decomposition primitives

several sklearn.ensemble primitives

Bug Fixes

Fix typo in mlprimitives.text.TextCleaner primitive

Fix bug in index handling in featuretools.dfs primitive

Fix bug in SingleLayerCNNImageClassifier annotation

Remove old vlaidation tags from JSON annotations

Source code(tar.gz)
Source code(zip)
v0.1.3(Oct 22, 2018)
New Features

Fix and re-enable featuretools.dfs primitive.

Source code(tar.gz)
Source code(zip)
v0.1.2(Oct 10, 2018)
New Features

Add pipeline specification language and Evaluation utilities.

Add pipelines for graph, text and tabular problems.

New primitives ClassEncoder and ClassDecoder

New primitives UniqueCounter and VocabularyCounter

Bug Fixes

Fix TrivialPredictor bug when working with numpy arrays

Change XGB default learning rate and number of estimators

Source code(tar.gz)
Source code(zip)
v0.1.1(Sep 21, 2018)
New Features

Add more keras.applications primitives.

Add a Text Cleanup primitive.

Bug Fixes

Add keywords to keras.preprocessing primtives.

Fix the image_transformmethod.

Add epoch as a fixed hyperparameter for keras.Sequential primitives.

Source code(tar.gz)
Source code(zip)

Owner

MLBazaar

The Machine Learning Bazaar

GitHub Repository https://mlbazaar.github.io/MLPrimitives

pywFM is a Python wrapper for Steffen Rendle's factorization machines library libFM

pywFM pywFM is a Python wrapper for Steffen Rendle's libFM. libFM is a Factorization Machine library: Factorization machines (FM) are a generic approa

251 Sep 23, 2022

A machine learning project that predicts the price of used cars in the UK

Car Price Prediction Image Credit: AA Cars Project Overview Scraped 3000 used cars data from AA Cars website using Python and BeautifulSoup. Cleaned t

7 Oct 13, 2022

Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort

2.3k Jan 04, 2023

ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

1.4k Dec 22, 2022

Forecast dynamically at scale with this unique package. pip install scalecast

🌄 Scalecast: Dynamic Forecasting at Scale About This package uses a scaleable forecasting approach in Python with common scikit-learn and statsmodels

158 Jan 03, 2023

Simple data balancing baselines for worst-group-accuracy benchmarks.

BalancingGroups Code to replicate the experimental results from Simple data balancing baselines achieve competitive worst-group-accuracy. Replicating

29 Dec 02, 2022

A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

matrixprofile-ts matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keo

696 Dec 26, 2022

Dieses Projekt ermöglicht es den Smartmeter der EVN (Netz Niederösterreich) über die Kundenschnittstelle auszulesen.

SmartMeterEVN Dieses Projekt ermöglicht es den Smartmeter der EVN (Netz Niederösterreich) über die Kundenschnittstelle auszulesen. Smart Meter werden

43 Dec 04, 2022

Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)

sklearn-compatible Random Bits Forest Scikit-learn compatible wrapper of the Random Bits Forest program written by Wang et al., 2016, available as a b

8 Jul 24, 2021

ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions

ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions, in particular, the posterior distributions of Bayesian models in

182 Dec 31, 2022

Book Recommender System Using Sci-kit learn N-neighbours

Model-Based-Recommender-Engine I created a book Recommender System using Sci-kit learn's N-neighbours algorithm for my model and the streamlit library

1 Jan 13, 2022

This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

1 Nov 03, 2021

Primitives for machine learning and data science.

Related tags

Overview

MLPrimitives

Overview

Why did we create this library?

Installation

Requirements

Install with pip

Quickstart

Running a Primitive

1. Load a Primitive

2. Load some data

3. Fit the primitive

4. Produce results

Tuning a Primitive

1. Load another primitive

2. Split the dataset

3. Fit the new primitive

4. Make predictions

5. Evalute the performance

6. Set new hyperparameter values

7. Re-evaluate the performance

What's Next?

Comments

Description

What I Did

Dependency tree

Description

What I Did

Description

Description

Patching CVE-2007-4559

Description

Description

Description

What I Did

Description

Solution

Releases(v0.3.2)

v0.3.2(Nov 9, 2021)

Adapter Improvements

v0.3.1(Oct 7, 2021)

Primitive Improvements

General Imporvements

v0.3.0(Jan 9, 2021)

New Primitives

Primitive Improvements

General Improvements

v0.2.5(Jul 29, 2020)

Primitive Improvements

Bug Fixes

New Primitives

v0.2.4(Jan 30, 2020)

New Primitives

Primitive Improvements

Bug Fixes

v0.2.3(Nov 14, 2019)

New Primitives

Primitive Improvements

Bug Fixes

v0.2.2(Oct 8, 2019)

New Primitives

Primitive Improvements

v0.2.1(Sep 9, 2019)

New Primitives

Primitive Improvements

Bug Fixes

v0.2.0(Jul 11, 2019)

New Features

Primitive Improvements

Bug Fixes

v0.1.10(May 23, 2019)

New Features

New Pipelines

Primitive Improvements

v0.1.9(Apr 25, 2019)

New Features

New Primitives

Primitive Improvements