A library to generate synthetic time series data by easy-to-use factors and generator

Last update: Dec 20, 2022

Overview

timeseries-generator

This repository consists of a python packages that generates synthetic time series dataset in a generic way (under /timeseries_generator) and demo notebooks on how to generate synthetic timeseries data (under /examples). The goal here is to have non-sensitive data available to demo solutions and test the effectiveness of those solutions and/or algorithms. In order to test your algorithm, you want to have time series available containing different kinds of trends. The python package should help create different kinds of time series while still being maintainable.

`timeseries_generator` package

For this package, it is assumed that a time series is composed of a base value multiplied by many factors.

ts = base_value * factor1 * factor2 * ... * factorN + Noiser

These factors can be anything, random noise, linear trends, to seasonality. The factors can affect different features. For example, some features in your time series may have a seasonal component, while others do not.

Different factors are represented in different classes, which inherit from the BaseFactor class. Factor classes are input for the Generator class, which creates a dataframe containing the features, base value, all the different factors working on the base value and and the final factor and value.

Core concept

Generator: a python class to generate the time series. A generator contains a list of factors and noiser. By overlaying the factors and noiser, generator can produce a customized time series
Factor: a python class to generate the trend, seasonality, holiday factors, etc. Factors take effect by multiplying on the base value of the generator.
Noised: a python class to generate time series noise data. Noiser take effect by summing on top of "factorized" time series. This formula describes the concepts we talk above

Built-in Factors

LinearTrend: give a linear trend based on the input slope and intercept
CountryYearlyTrend: give a yearly-based market cap factor based on the GDP per - capita.
EUEcoTrendComponents: give a monthly changed factor based on EU industry product public data
HolidayTrendComponents: simulate the holiday sale peak. It adapts the holiday days - differently in different country
BlackFridaySaleComponents: simulate the BlackFriday sale event
WeekendTrendComponents: more sales at weekends than on weekdays
FeatureRandFactorComponents: set up different sale amount for different stores and different product
ProductSeasonTrendComponents: simulate season-sensitive product sales. In this example code, we have 3 different types of product:
- winter jacket: inverse-proportional to the temperature, more sales in winter
- basketball top: proportional to the temperature, more sales in summer
- Yoga Mat: temperature insensitive

Installation

pip install timeseries-generator

Usage

from timeseries_generator import LinearTrend, Generator, WhiteNoise, RandomFeatureFactor
import pandas as pd

# setting up a linear tren
lt = LinearTrend(coef=2.0, offset=1., col_name="my_linear_trend")
g = Generator(factors={lt}, features=None, date_range=pd.date_range(start="01-01-2020", end="01-20-2020"))
g.generate()
g.plot()

# update by adding some white noise to the generator
wn = WhiteNoise(stdev_factor=0.05)
g.update_factor(wn)
g.generate()
g.plot()

Example Notebooks

We currently have 2 example notebooks available:

generate_stationary_process: Good for introducing the basics of the timeseries_generator. Shows how to apply simple linear trends and how to introduce features and labels, as well as random noise.
use_external_factors: Goes more into detail and shows how to use the external_factors submodule. Shows how to create seasonal trends.

Web based prototyping UI

We also use Streamlit to build a web-based UI to demonstrate how to use this package to generate synthesis time series data in an interactive web UI.

streamlit run examples/streamlit/app.py

License

This package is released under the Apache License, Version 2.0

A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Jan 5, 2023

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

15.4k Jan 7, 2023

A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Dec 29, 2022

Visualize classified time series data with interactive Sankey plots in Google Earth Engine

sankee Visualize changes in classified time series data with interactive Sankey plots in Google Earth Engine Contents Description Installation Using P

76 Dec 15, 2022

PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

A python toolbox/library for data mining on partially-observed time series, supporting tasks of forecasting/imputation/classification/clustering on incomplete multivariate time series with missing values.

179 Dec 31, 2022

A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

0 Mar 30, 2022

Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

7k Jan 6, 2023

A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

6k Jan 6, 2023

Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

3.3k Jan 3, 2023

Comments

Time series data augmentation

There is a code example that gives to increase the amount of series data by adding slightly modified copies of already existing time series data or newly created synthetic series data from existing data?

opened by YAYAYru 0

KeyError: 'country'

From the following code,

from timeseries_generator import HolidayFactor, LinearTrend, Generator

lt = LinearTrend(coef=2.0, offset=1., col_name="my_linear_trend")

g: Generator = Generator(factors={lt}, features=None, date_range=pd.date_range(start="01-01-2020", end="01-01-2021"))

holiday_factor = HolidayFactor(
    country_feature_name="country",
)
g.add_factor(holiday_factor)
g.generate()

I get the error. I am not sure this is expected behavior.

File /usr/local/Caskroom/miniconda/base/envs/tf/lib/python3.9/site-packages/pandas/core/frame.py:10083, in DataFrame.merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
...
-> 1849     raise KeyError(key)
   1851 # Check for duplicates
   1852 if values.ndim > 1:

KeyError: 'country'

opened by twobitunicorn 0

[Feature request] Customizable feature combinations
Hi team, Thanks for the useful library! I wonder if you'd be open to this idea:

I would like to be able to:

Set up categorizing features (let's say, for illustration, CATEGORY=[footwear, t-shirts, socks], SIZE=[S, M, L, US-Mens-8, US-Womens-6) and define Factors on them

Generate time-series with more restricted feature combinations than the outer product (again for illustration, "t-shirt sizes for t-shirts, shoe sizes for footwear")

Today, it seems like Generator.generate() hard-codes the assumption that time-series should be generated for the product of all provided feature values.

It'd be helpful if, instead, we could have the option of customizing this join to limit down generated combinations?

Some options I can think of:

Leave the library as-is: Users generate full outer product and limit down what they want in post-processing

This seems possible already, but very RAM-intensive if your desired combinations are sparse?

Accept an optional dataframe of factor combinations as parameter to the generate() method

Gives full flexibility over which combinations are kept / ignored, without assuming any particular rigid hierarchies between features

...But might need to do a bit of validation to protect against user errors? May not be super easy to use without some documented examples / functions to generate the dataframe

Some more complex API for feature configuration that accommodates specifying valid/invalid feature combinations

Might be nicer for usability, but difficult to make general: E.g. a straightforward hierarchy could be represented as a nested dict, but in practice many applications have multiple intersecting views of product category information e.g. brand, type, target segment, etc.
opened by athewsey 1
Generate hourly data

First of all, thank you for making this repository public! I enjoy its ease of use and the built-in factors.

Problem description

I'm currently trying to generate revenue data for a bar/restaurant on an hourly basis. As far as I can see, the timeseries-generator only supports generating one data point per day, not per hour.

I tried to generate hourly data like g = Generator(factors={lt}, features=None, date_range=pd.date_range(start='15/9/2021', end='30/9/2021', freq='h')) which didn't work.

Potential solution

Add the possibility to generate hourly data too. If this is a promising idea in your opinion, I'm willing to contribute to the implementation.

Thank you in advance!

opened by nileger 1

Releases(v0.1.0)

v0.1.0(Jul 20, 2021)
first release of time series generators, including:

base factor

linear trend factor

sinusoidal factor

white noise factor

random factor

holiday factor

weekday factor

country GDP factor

EU industry index factor

Examples

notebooks which includes some simple examples

streamlit dashboard

Source code(tar.gz)
Source code(zip)

Owner

Nike Inc.

GitHub Repository

Multiple Linear Regression using the LinearRegression class from sklearn.linear_model library

Multiple-Linear-Regression-master - A python program to implement Multiple Linear Regression using the LinearRegression class from sklearn.linear model library

1 Feb 06, 2022

Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters

Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters. It also works with any regressor compatible with the scikit-learn API (pipelines, CatBoost, LightGBM

297 Jan 09, 2023

Python implementation of the rulefit algorithm

RuleFit Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF) The algorithm can be used f

326 Jan 02, 2023

Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc)

Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc). Structured a custom ensemble model and a neural network. Found a outperformed

1 Feb 06, 2022

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

sklearn-porter Transpile trained scikit-learn estimators to C, Java, JavaScript and others. It's recommended for limited embedded systems and critical

1.2k Jan 05, 2023

A quick reference guide to the most commonly used patterns and functions in PySpark SQL

Using PySpark we can process data from Hadoop HDFS, AWS S3, and many file systems. PySpark also is used to process real-time data using Streaming and

53 Dec 21, 2022

QML: A Python Toolkit for Quantum Machine Learning

QML is a Python2/3-compatible toolkit for representation learning of properties of molecules and solids.

176 Dec 09, 2022

An implementation of Relaxed Linear Adversarial Concept Erasure (RLACE)

Background This repository contains an implementation of Relaxed Linear Adversarial Concept Erasure (RLACE). Given a dataset X of dense representation

4 Apr 13, 2022

My capstone project for Udacity's Machine Learning Nanodegree

MLND-Capstone My capstone project for Udacity's Machine Learning Nanodegree Lane Detection with Deep Learning In this project, I use a deep learning-b

407 Dec 12, 2022

Learn Machine Learning Algorithms by doing projects in Python and R Programming Language

Learn Machine Learning Algorithms by doing projects in Python and R Programming Language. This repo covers all aspect of Machine Learning Algorithms.

6 Oct 20, 2022

Predict the output which should give a fair idea about the chances of admission for a student for a particular university

Predict the output which should give a fair idea about the chances of admission for a student for a particular university.

1 Jan 11, 2022

database for artificial intelligence/machine learning data

AIDB v0.0.1 database for artificial intelligence/machine learning data Overview aidb is a database designed for large dataset for machine learning pro

1 Oct 24, 2021

OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems.

OptaPy is an AI constraint solver for Python to optimize the Vehicle Routing Problem, Employee Rostering, Maintenance Scheduling, Task Assignment, School Timetabling, Cloud Optimization, Conference S

208 Dec 27, 2022

A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Demand-Forecasting Business Problem A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

3 Mar 06, 2022

This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you ask it.

Crypto-Currency-Predictor This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you

6 Dec 04, 2022

The Ultimate FREE Machine Learning Study Plan

2.5k Jan 05, 2023

Programming assignments and quizzes from all courses within the Machine Learning Engineering for Production (MLOps) specialization offered by deeplearning.ai

Machine Learning Engineering for Production (MLOps) Specialization on Coursera (offered by deeplearning.ai) Programming assignments from all courses i

173 Jan 05, 2023

A library to generate synthetic time series data by easy-to-use factors and generator

Related tags

Overview

timeseries-generator

timeseries_generator package

Core concept

Built-in Factors

Installation

Usage

Example Notebooks

Web based prototyping UI

License

You might also like...

A machine learning toolkit dedicated to time-series data

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

A machine learning toolkit dedicated to time-series data

Visualize classified time series data with interactive Sankey plots in Google Earth Engine

PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

A collection of Scikit-Learn compatible time series transformers and tools.

Automatic extraction of relevant features from time series:

A unified framework for machine learning with time series

Probabilistic time series modeling in Python

Comments

Time series data augmentation

KeyError: 'country'

[Feature request] Customizable feature combinations

Generate hourly data

Problem description

Potential solution

Releases(v0.1.0)

v0.1.0(Jul 20, 2021)

Owner

Nike Inc.

Multiple Linear Regression using the LinearRegression class from sklearn.linear_model library

Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters

Python implementation of the rulefit algorithm

Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc)

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

A quick reference guide to the most commonly used patterns and functions in PySpark SQL

QML: A Python Toolkit for Quantum Machine Learning

An implementation of Relaxed Linear Adversarial Concept Erasure (RLACE)

My capstone project for Udacity's Machine Learning Nanodegree

Learn Machine Learning Algorithms by doing projects in Python and R Programming Language

Predict the output which should give a fair idea about the chances of admission for a student for a particular university

database for artificial intelligence/machine learning data

OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems.

A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you ask it.

The Ultimate FREE Machine Learning Study Plan

Programming assignments and quizzes from all courses within the Machine Learning Engineering for Production (MLOps) specialization offered by deeplearning.ai

Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)

LibTraffic is a unified, flexible and comprehensive traffic prediction library based on PyTorch

Random Forest Classification for Neural Subtypes

`timeseries_generator` package