A collection of Scikit-Learn compatible time series transformers and tools.

Overview

tsfeast

build codecov

A collection of Scikit-Learn compatible time series transformers and tools.

Installation

Create a virtual environment and install:

From PyPi

pip install tsfeast

From this repo

pip install git+https://github.com/chris-santiago/tsfeast.git

Use

Preliminaries

This example shows both the use of individual transformers and the TimeSeriesFeatures convenience class that wraps multiple transformers. Both methods are compatible with Scikit-Learn Pipeline objects.

import warnings
warnings.filterwarnings("ignore")  # ignore pandas concat warnings from statsmodels

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression, Lasso, PoissonRegressor
from sklearn.pipeline import Pipeline
from sklearn.feature_selection import SelectKBest
from sklearn.metrics import mean_squared_error, mean_absolute_percentage_error, mean_absolute_error
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from statsmodels.tsa.arima_process import arma_generate_sample
from steps.forward import ForwardSelector

from tsfeast.transformers import DateTimeFeatures, InteractionFeatures, LagFeatures
from tsfeast.tsfeatures import TimeSeriesFeatures
from tsfeast.funcs import get_datetime_features
from tsfeast.utils import plot_diag
from tsfeast.models import ARMARegressor
def make_dummy_data(n=200):
    n_lags = 2
    coefs = {'ar': [1, -0.85], 'ma': [1, 0], 'trend': 3.2, 'bdays_in_month': 231, 'marketing': 0.0026}
    rng = np.random.default_rng(seed=42)
    
    sales = pd.DataFrame({
        'date': pd.date_range(end='2020-08-31', periods=n, freq='M'),
        'sales_base': rng.poisson(200, n),
        'sales_ar': arma_generate_sample(ar=coefs['ar'], ma=coefs['ma'], nsample=n, scale=100),
        'sales_trend': [x * coefs['trend'] + rng.poisson(300) for x in range(1, n+1)],
    })
    
    sales = sales.join(get_datetime_features(sales['date'])[['bdays_in_month', 'quarter']])
    sales['sales_per_day'] = sales['bdays_in_month'] * coefs['bdays_in_month'] + rng.poisson(100, n)
    
    sales['mkt_base'] = rng.normal(1e6, 1e4, n)
    sales['mkt_trend'] = np.array([x * 5e3 for x in range(1, n+1)]) + rng.poisson(100)
    sales['mkt_season'] = np.where(sales['quarter'] == 3, sales['mkt_base'] * .35, 0)
    sales['mkt_total'] = sales.loc[:, 'mkt_base': 'mkt_season'].sum(1) + rng.poisson(100, n)
    sales['sales_mkting'] = sales['mkt_total'].shift(n_lags) * coefs['marketing']
    
    final = pd.DataFrame({
        'y': sales[['sales_base', 'sales_ar', 'sales_trend', 'sales_per_day', 'sales_mkting']].sum(1).astype(int),
        'date': sales['date'],
        'marketing': sales['mkt_total'],
        'x2': rng.random(n),
        'x3': rng.normal(loc=320, scale=4, size=n)
    })
    return sales.iloc[2:, :], final.iloc[2:, :]
def get_results(estimator, x_train, x_test, y_train, y_test):
    return pd.DataFrame(
        {
            'training': [
                mean_absolute_error(y_train, estimator.predict(x_train)), 
                mean_absolute_percentage_error(y_train, estimator.predict(x_train))
            ],
            'testing':  [
                mean_absolute_error(y_test, estimator.predict(x_test)), 
                mean_absolute_percentage_error(y_test, estimator.predict(x_test))
            ],
        },
        index = ['MAE', 'MAPE']
    )

Example Data

The dummy dataset in this example includes trend, seasonal, autoregressive and other factor components. Below, we visualize the individual components (comps) and features of the dummy dataset data.

comps, data = make_dummy_data()

Sales Components

comps.head()
date sales_base sales_ar sales_trend bdays_in_month quarter sales_per_day mkt_base mkt_trend mkt_season mkt_total sales_mkting
2 2004-03-31 211 153.620257 285.6 23 1 5402 1.012456e+06 15128.0 0.000000 1.027692e+06 2584.285914
3 2004-04-30 181 18.958345 300.8 22 2 5180 1.009596e+06 20128.0 0.000000 1.029835e+06 2661.116408
4 2004-05-31 195 54.420246 312.0 20 2 4726 9.848525e+05 25128.0 0.000000 1.010071e+06 2672.000109
5 2004-06-30 206 31.100042 326.2 22 2 5195 1.008291e+06 30128.0 0.000000 1.038529e+06 2677.570754
6 2004-07-31 198 34.283905 317.4 21 3 4952 1.004049e+06 35128.0 351416.992807 1.390691e+06 2626.185776
for col in comps.columns:
    print(f'Column: {col}')
    plt.figure(figsize=(10, 5))
    plt.plot(comps[col])
    plt.show()
Column: date

png

Column: sales_base

png

Column: sales_ar

png

Column: sales_trend

png

Column: bdays_in_month

png

Column: quarter

png

Column: sales_per_day

png

Column: mkt_base

png

Column: mkt_trend

png

Column: mkt_season

png

Column: mkt_total

png

Column: sales_mkting

png

Dummy Dataset

data.head()
y date marketing x2 x3
2 8636 2004-03-31 1.027692e+06 0.716752 316.389974
3 8341 2004-04-30 1.029835e+06 0.466509 318.780107
4 7959 2004-05-31 1.010071e+06 0.361299 324.917503
5 8435 2004-06-30 1.038529e+06 0.852623 316.776026
6 8127 2004-07-31 1.390691e+06 0.571951 314.425310
for col in data.columns:
    print(f'Column: {col}')
    plt.figure(figsize=(10, 5))
    plt.plot(data[col])
    plt.show()
Column: y

png

Column: date

png

Column: marketing

png

Column: x2

png

Column: x3

png

X = data.iloc[:, 1:]
y = data.iloc[:, 0]
x_train, x_test = X.iloc[:-40, :], X.iloc[-40:, :]
y_train, y_test = y.iloc[:-40], y.iloc[-40:]

Individual Transformers

tsfeast provides individual time series transformers that can be used by themselves or within Scikit-Learn Pipeline objects:

Transformer Parameters Description
OriginalFeatures None Passes original features through pipeline.
Scaler None Wraps Scikit-Learn StandardScaler to maintain DataFrame columns.
DateTimeFeatures date_col: str, dt_format: str, freq: str Generates datetime features from a given date column.
LaggedFeatures n_lags: int, fillna: bool Generate lag features.
RollingFeatures window_lengths: List[int], fillna: bool Generate rolling features (mean, std, min, max) for each specified window length.
EwmaFeatures window_lengths: List[int], fillna: bool Generate exponentially-weighted moving average for each specified window length.
ChangeFeatures period_lengths: List[int], fillna: bool Generate percent change for all features for each specified period length.
DifferenceFeatures n_diffs: int, fillna: bool Generate n differences for all features.
PolyFeatures degree: int Generate polynomial features.
InteractionFeatures None Wraps Scikit-Learn PolynomialFeatures to generate interaction features and maintain DataFrame columns.

Notes on Pipeline Use

Behavior of Scikit-Learn Pipeline objects is appropriate and intended for independent data observations, but not necessarily appropriate for the temporal dependencies inherent in time series.

Scikit-Learn pipelines only call the .transform() method during the .predict() method, which is appropriate to prevent data leakage in predictions. However, most of the transformers in this package take a set of features and generate new features; there's no inherent method to transform some time series features given a fitted estimator.

For time series lags, changes, etc., we have access to past data for feature generation without risk of data leakage; certain features (e.g. lags) require this to avoid NaNs or zeros. This behavior is appropriate for time series transformations, only.

Generate DateTime Features

dt = DateTimeFeatures(date_col='date')
dt.fit_transform(X, y)
year quarter month days_in_month bdays_in_month leap_year
2 2004 1 3 31 23 1
3 2004 2 4 30 22 1
4 2004 2 5 31 20 1
5 2004 2 6 30 22 1
6 2004 3 7 31 21 1
... ... ... ... ... ... ...
195 2020 2 4 30 22 1
196 2020 2 5 31 20 1
197 2020 2 6 30 22 1
198 2020 3 7 31 22 1
199 2020 3 8 31 21 1

198 rows × 6 columns

Generate Interaction Features

feat = LagFeatures(n_lags=4)
feat.fit_transform(X.iloc[:, 1:], y)  # skipping date column
marketing_lag_1 x2_lag_1 x3_lag_1 marketing_lag_2 x2_lag_2 x3_lag_2 marketing_lag_3 x2_lag_3 x3_lag_3 marketing_lag_4 x2_lag_4 x3_lag_4
2 0.000000e+00 0.000000 0.000000 0.000000e+00 0.000000 0.000000 0.000000e+00 0.000000 0.000000 0.000000e+00 0.000000 0.000000
3 1.027692e+06 0.716752 316.389974 0.000000e+00 0.000000 0.000000 0.000000e+00 0.000000 0.000000 0.000000e+00 0.000000 0.000000
4 1.029835e+06 0.466509 318.780107 1.027692e+06 0.716752 316.389974 0.000000e+00 0.000000 0.000000 0.000000e+00 0.000000 0.000000
5 1.010071e+06 0.361299 324.917503 1.029835e+06 0.466509 318.780107 1.027692e+06 0.716752 316.389974 0.000000e+00 0.000000 0.000000
6 1.038529e+06 0.852623 316.776026 1.010071e+06 0.361299 324.917503 1.029835e+06 0.466509 318.780107 1.027692e+06 0.716752 316.389974
... ... ... ... ... ... ... ... ... ... ... ... ...
195 1.971301e+06 0.420222 313.911203 1.968782e+06 0.648398 327.288221 1.973312e+06 0.860346 319.932653 1.967943e+06 0.216269 317.692606
196 1.981624e+06 0.188104 324.110324 1.971301e+06 0.420222 313.911203 1.968782e+06 0.648398 327.288221 1.973312e+06 0.860346 319.932653
197 1.977056e+06 0.339024 315.926738 1.981624e+06 0.188104 324.110324 1.971301e+06 0.420222 313.911203 1.968782e+06 0.648398 327.288221
198 1.978757e+06 0.703778 320.409889 1.977056e+06 0.339024 315.926738 1.981624e+06 0.188104 324.110324 1.971301e+06 0.420222 313.911203
199 2.332540e+06 0.204360 319.029524 1.978757e+06 0.703778 320.409889 1.977056e+06 0.339024 315.926738 1.981624e+06 0.188104 324.110324

198 rows × 12 columns

TimeSeriesFeatures Class

tsfeast also includes a TimeSeriesFeatures class that generates multiple time series features in one transformer. The only required parameter is the column of datetimes; the optional parameters control what additional transformers are included.

Parameter Type Description
datetime str Column that holds datetime information
trend str Trend to include, options are {'n': no trend, 'c': constant only, 't': linear trend, 'ct': constant and linear trend, 'ctt': constant, linear and quadratric trend}; defaults to no trend
lags int Number of lags to include (optional).
rolling List[int] Number of rolling windows to include (optional).
ewma List[int] Number of ewma windows to include (optional).
pct_chg List[int] Periods to use for percent change features (optional).
diffs int Number of differences to include (optional).
polynomial int Polynomial(s) to include (optional).
interactions bool Whether to include interactions of original featutes; deault True.
fillna bool Whether to fill NaN values with zero; default True.
feat = TimeSeriesFeatures(
    datetime='date',
    trend='t',
    lags=4,
    interactions=False,
    polynomial=3
)
features = feat.fit_transform(X, y)
features.head()
trend original__marketing original__x2 original__x3 datetime__year datetime__quarter datetime__month datetime__days_in_month datetime__bdays_in_month datetime__leap_year ... features__lags__x3_lag_3 features__lags__marketing_lag_4 features__lags__x2_lag_4 features__lags__x3_lag_4 features__polynomial__marketing^2 features__polynomial__x2^2 features__polynomial__x3^2 features__polynomial__marketing^3 features__polynomial__x2^3 features__polynomial__x3^3
0 1.0 1.027692e+06 0.716752 316.389974 2004.0 1.0 3.0 31.0 23.0 1.0 ... 0.000000 0.000000e+00 0.000000 0.000000 1.056152e+12 0.513733 100102.615631 1.085399e+18 0.368219 3.167146e+07
1 2.0 1.029835e+06 0.466509 318.780107 2004.0 2.0 4.0 30.0 22.0 1.0 ... 0.000000 0.000000e+00 0.000000 0.000000 1.060560e+12 0.217631 101620.756699 1.092202e+18 0.101527 3.239468e+07
2 3.0 1.010071e+06 0.361299 324.917503 2004.0 2.0 5.0 31.0 20.0 1.0 ... 0.000000 0.000000e+00 0.000000 0.000000 1.020244e+12 0.130537 105571.383672 1.030520e+18 0.047163 3.430199e+07
3 4.0 1.038529e+06 0.852623 316.776026 2004.0 2.0 6.0 30.0 22.0 1.0 ... 316.389974 0.000000e+00 0.000000 0.000000 1.078543e+12 0.726966 100347.050373 1.120098e+18 0.619827 3.178754e+07
4 5.0 1.390691e+06 0.571951 314.425310 2004.0 3.0 7.0 31.0 21.0 1.0 ... 318.780107 1.027692e+06 0.716752 316.389974 1.934020e+12 0.327128 98863.275608 2.689624e+18 0.187101 3.108512e+07

5 rows × 28 columns

[x for x in features.columns]
['trend',
 'original__marketing',
 'original__x2',
 'original__x3',
 'datetime__year',
 'datetime__quarter',
 'datetime__month',
 'datetime__days_in_month',
 'datetime__bdays_in_month',
 'datetime__leap_year',
 'features__lags__marketing_lag_1',
 'features__lags__x2_lag_1',
 'features__lags__x3_lag_1',
 'features__lags__marketing_lag_2',
 'features__lags__x2_lag_2',
 'features__lags__x3_lag_2',
 'features__lags__marketing_lag_3',
 'features__lags__x2_lag_3',
 'features__lags__x3_lag_3',
 'features__lags__marketing_lag_4',
 'features__lags__x2_lag_4',
 'features__lags__x3_lag_4',
 'features__polynomial__marketing^2',
 'features__polynomial__x2^2',
 'features__polynomial__x3^2',
 'features__polynomial__marketing^3',
 'features__polynomial__x2^3',
 'features__polynomial__x3^3']

Pipeline Example

The TimeSeriesFeatures class can be used as a feature generation step within a Scikit-Learn Pipeline. Given the temporal nature of the data and models, this may not be appropriate for all use cases-- though the class remains fully compatible with Pipeline objects.

We'll instantiate a TimeSeriesFeatures object with a linear trend, four lags and no interactions. Our pipeline will include feature generation, feature scaling and feature selection steps, before modeling with ordinary least squares.

Note: the ForwardSelector class is available in the step-select package (https://pypi.org/project/step-select/).

The pipeline creates a total of 22 features, before selecting only four to use in the final model. Note that 3 of the 4 final features corresponed with features from our "true model" that created the dummy dataset ('trend', 'datetime__bdays_in_month' and 'marketing_lag_2').

Regression diagnostic plots show evidence of slightly non-normal residuals and (1) autoregressive term (again, as specified in the "true model"). We'll address the autoregressive term in the next example.

feat = TimeSeriesFeatures(
    datetime='date',
    trend='t',
    lags=4,
    interactions=False
)

pl = Pipeline([
    ('feature_extraction', feat),
    ('scaler', StandardScaler()),
    ('feature_selection', ForwardSelector(metric='bic')),
    ('regression', LinearRegression())
])

pl.fit(x_train, y_train)
Pipeline(steps=[('feature_extraction',
                 TimeSeriesFeatures(datetime='date', interactions=False, lags=4,
                                    trend='t')),
                ('scaler', StandardScaler()),
                ('feature_selection', ForwardSelector(metric='bic')),
                ('regression', LinearRegression())])
pl.named_steps.feature_extraction.output_features_
trend original__marketing original__x2 original__x3 datetime__year datetime__quarter datetime__month datetime__days_in_month datetime__bdays_in_month datetime__leap_year ... features__lags__x3_lag_1 features__lags__marketing_lag_2 features__lags__x2_lag_2 features__lags__x3_lag_2 features__lags__marketing_lag_3 features__lags__x2_lag_3 features__lags__x3_lag_3 features__lags__marketing_lag_4 features__lags__x2_lag_4 features__lags__x3_lag_4
0 1.0 1.027692e+06 0.716752 316.389974 2004.0 1.0 3.0 31.0 23.0 1.0 ... 0.000000 0.000000e+00 0.000000 0.000000 0.000000e+00 0.000000 0.000000 0.000000e+00 0.000000 0.000000
1 2.0 1.029835e+06 0.466509 318.780107 2004.0 2.0 4.0 30.0 22.0 1.0 ... 316.389974 0.000000e+00 0.000000 0.000000 0.000000e+00 0.000000 0.000000 0.000000e+00 0.000000 0.000000
2 3.0 1.010071e+06 0.361299 324.917503 2004.0 2.0 5.0 31.0 20.0 1.0 ... 318.780107 1.027692e+06 0.716752 316.389974 0.000000e+00 0.000000 0.000000 0.000000e+00 0.000000 0.000000
3 4.0 1.038529e+06 0.852623 316.776026 2004.0 2.0 6.0 30.0 22.0 1.0 ... 324.917503 1.029835e+06 0.466509 318.780107 1.027692e+06 0.716752 316.389974 0.000000e+00 0.000000 0.000000
4 5.0 1.390691e+06 0.571951 314.425310 2004.0 3.0 7.0 31.0 21.0 1.0 ... 316.776026 1.010071e+06 0.361299 324.917503 1.029835e+06 0.466509 318.780107 1.027692e+06 0.716752 316.389974
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
153 154.0 1.752743e+06 0.060631 322.823879 2016.0 4.0 12.0 31.0 21.0 1.0 ... 312.156618 1.750890e+06 0.537173 319.820019 2.110972e+06 0.368344 324.492379 2.127929e+06 0.320161 322.674221
154 155.0 1.782890e+06 0.368878 313.360448 2017.0 1.0 1.0 31.0 20.0 0.0 ... 322.823879 1.762560e+06 0.296868 312.156618 1.750890e+06 0.537173 319.820019 2.110972e+06 0.368344 324.492379
155 156.0 1.788336e+06 0.254549 321.235197 2017.0 1.0 2.0 28.0 19.0 0.0 ... 313.360448 1.752743e+06 0.060631 322.823879 1.762560e+06 0.296868 312.156618 1.750890e+06 0.537173 319.820019
156 157.0 1.790967e+06 0.385921 316.450145 2017.0 1.0 3.0 31.0 23.0 0.0 ... 321.235197 1.782890e+06 0.368878 313.360448 1.752743e+06 0.060631 322.823879 1.762560e+06 0.296868 312.156618
157 158.0 1.811012e+06 0.196960 315.360643 2017.0 2.0 4.0 30.0 20.0 0.0 ... 316.450145 1.788336e+06 0.254549 321.235197 1.782890e+06 0.368878 313.360448 1.752743e+06 0.060631 322.823879

158 rows × 22 columns

new_features = pl.named_steps.feature_extraction.feature_names_
mask = pl.named_steps.feature_selection.get_support()
new_features[mask]
Index(['trend', 'datetime__bdays_in_month', 'features__lags__marketing_lag_2',
       'features__lags__x3_lag_2'],
      dtype='object')
get_results(pl, x_train, x_test, y_train, y_test)
training testing
MAE 373.819325 201.999695
MAPE 0.040046 0.017827
resid = (y_train - pl.predict(x_train))
plot_diag(resid.iloc[2:])  # throw out first two residuals b/c of lags

png

ARMA Regressor

tsfeast includes a models module that provides an ARMARegressor class for extending Scikit-Learn regressors by adding support for AR/MA or ARIMA residuals. It accepts an arbitrary Scikit-Learn regressor and a tuple indicating the (p,d,q) order for the residuals model.

Attribute Description
estimator The Scikit-Learn regressor.
order The (p,d,q,) order of the ARMA model.
intercept_ The fitted estimator's intercept.
coef_ The fitted estimator's coefficients.
arma_ The fitted ARMA model.
fitted_values_ The combined estimator and ARMA fitted values.
resid_ The combined estimator and ARMA residual values.

Note The predict method should not be used to get fitted values from the training set; rather, users should access this same data using the fitted_values_ attribute. The predict method calls the ARMA regresor's forecast method, which generates predictions from the last time step in the training data, thus would not match, temporally, in a predict call with training data.

The pipeline follows the same steps as the previous example, with the only change beging the regression model-- in this case, the ARMARegressor. Metrics on test set slightly improve and we no longer see evidence of autoregressive term in the residuals.

feat = TimeSeriesFeatures(
    datetime='date',
    trend='t',
    lags=4,
    interactions=False
)

mod = ARMARegressor(
    estimator=PoissonRegressor(),
    order=(1,0,0)
)

pl = Pipeline([
    ('feature_extraction', feat),
    ('scaler', StandardScaler()),
    ('feature_selection', ForwardSelector(metric='bic')),
    ('regression', mod)
])

pl.fit(x_train, y_train)
Pipeline(steps=[('feature_extraction',
                 TimeSeriesFeatures(datetime='date', interactions=False, lags=4,
                                    trend='t')),
                ('scaler', StandardScaler()),
                ('feature_selection', ForwardSelector(metric='bic')),
                ('regression', ARMARegressor(estimator=PoissonRegressor()))])
new_features = pl.named_steps.feature_extraction.feature_names_
mask = pl.named_steps.feature_selection.get_support()
new_features[mask]
Index(['trend', 'datetime__bdays_in_month', 'features__lags__marketing_lag_2',
       'features__lags__x3_lag_2'],
      dtype='object')
get_results(pl, x_train, x_test, y_train, y_test)
training testing
MAE 409.572082 143.269046
MAPE 0.043573 0.012745
plot_diag(pl.named_steps.regression.resid_)

png

You might also like...
icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models
icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

icepickle It's a cooler way to store simple linear models. The goal of icepickle is to allow a safe way to serialize and deserialize linear scikit-lea

A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Automated Machine Learning with scikit-learn

auto-sklearn auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. Find the documentation here

Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

Distributed scikit-learn meta-estimators in PySpark
Distributed scikit-learn meta-estimators in PySpark

sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn

Scikit-Learn useful pre-defined Pipelines Hub
Scikit-Learn useful pre-defined Pipelines Hub

Scikit-Pipes Scikit-Learn useful pre-defined Pipelines Hub Usage: Install scikit-pipes It's advised to install sklearn-genetic using a virtual env, in

Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn
Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn

Clustering Clustering Application in Python Using scikit-learn This repository contains the prediction of baseball metric clusters using MLB Statcast

Painless Machine Learning for python based on scikit-learn

PlainML Painless Machine Learning Library for python based on scikit-learn. Install pip install plainml Example from plainml import KnnModel, load_ir

Releases(v0.1.1)
Confidence intervals for scikit-learn forest algorithms

forest-confidence-interval: Confidence intervals for Forest algorithms Forest algorithms are powerful ensemble methods for classification and regressi

272 Dec 01, 2022
Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm.

Naive-Bayes Spam Classificator Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm. Main goal is to code a

Viktoria Maksymiuk 1 Jun 27, 2022
Upgini : data search library for your machine learning pipelines

Automated data search library for your machine learning pipelines → find & deliver relevant external data & features to boost ML accuracy :chart_with_upwards_trend:

Upgini 175 Jan 08, 2023
Coursera Machine Learning - Python code

Coursera Machine Learning This repository contains python implementations of certain exercises from the course by Andrew Ng. For a number of assignmen

Jordi Warmenhoven 859 Dec 10, 2022
Machine Learning Course with Python:

A Machine Learning Course with Python Table of Contents Download Free Deep Learning Resource Guide Slack Group Introduction Motivation Machine Learnin

Instill AI 6.9k Jan 03, 2023
Bodywork deploys machine learning projects developed in Python, to Kubernetes.

Bodywork deploys machine learning projects developed in Python, to Kubernetes. It helps you to: serve models as microservices execute batch jobs run r

Bodywork Machine Learning 409 Jan 01, 2023
Regularization and Feature Selection in Least Squares Temporal Difference Learning

Regularization and Feature Selection in Least Squares Temporal Difference Learning Description This is Python implementations of Least Angle Regressio

Mina Parham 0 Jan 18, 2022
Lightweight Machine Learning Experiment Logging 📖

Simple logging of statistics, model checkpoints, plots and other objects for your Machine Learning Experiments (MLE). Furthermore, the MLELogger comes with smooth multi-seed result aggregation and co

Robert Lange 65 Dec 08, 2022
🎛 Distributed machine learning made simple.

🎛 lazycluster Distributed machine learning made simple. Use your preferred distributed ML framework like a lazy engineer. Getting Started • Highlight

Machine Learning Tooling 44 Nov 27, 2022
Time series forecasting with PyTorch

Our article on Towards Data Science introduces the package and provides background information. Pytorch Forecasting aims to ease state-of-the-art time

Jan Beitner 2.5k Jan 02, 2023
A Collection of Conference & School Notes in Machine Learning 🦄📝🎉

Machine Learning Conference & Summer School Notes. 🦄📝🎉

558 Dec 28, 2022
XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

92 Dec 14, 2022
A Powerful Serverless Analysis Toolkit That Takes Trial And Error Out of Machine Learning Projects

KXY: A Seemless API to 10x The Productivity of Machine Learning Engineers Documentation https://www.kxy.ai/reference/ Installation From PyPi: pip inst

KXY Technologies, Inc. 35 Jan 02, 2023
Merlion: A Machine Learning Framework for Time Series Intelligence

Merlion is a Python library for time series intelligence. It provides an end-to-end machine learning framework that includes loading and transforming data, building and training models, post-processi

Salesforce 2.8k Jan 05, 2023
Add built-in support for quaternions to numpy

Quaternions in numpy This Python module adds a quaternion dtype to NumPy. The code was originally based on code by Martin Ling (which he wrote with he

Mike Boyle 531 Dec 28, 2022
Machine Learning Techniques using python.

👋 Hi, I’m Fahad from TEXAS TECH. 👀 I’m interested in Optimization / Machine Learning/ Statistics 🌱 I’m currently learning Machine Learning and Stat

FAHAD MOSTAFA 1 Jan 19, 2022
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Daniel Formoso 5.7k Dec 30, 2022
Bonsai: Gradient Boosted Trees + Bayesian Optimization

Bonsai is a wrapper for the XGBoost and Catboost model training pipelines that leverages Bayesian optimization for computationally efficient hyperparameter tuning.

24 Oct 27, 2022
Real-time stream processing for python

Streamz Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelin

Python Streamz 1.1k Dec 28, 2022
This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing variance.

minvar_invest_portfolio This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing var

1 Jan 06, 2022