Greykite: A flexible, intuitive and fast forecasting library

Last update: Jan 15, 2022

Related tags

Overview

Greykite: A flexible, intuitive and fast forecasting library

Why Greykite?

The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite.

Silverkite algorithm works well on most time series, and is especially adept for those with changepoints in trend or seasonality, event/holiday effects, and temporal dependencies. Its forecasts are interpretable and therefore useful for trusted decision-making and insights.

The Greykite library provides a framework that makes it easy to develop a good forecast model, with exploratory data analysis, outlier/anomaly preprocessing, feature extraction and engineering, grid search, evaluation, benchmarking, and plotting. Other open source algorithms can be supported through Greykite’s interface to take advantage of this framework, as listed below.

For a demo, please see our quickstart.

Distinguishing Features

Flexible design
- Provides time series regressors to capture trend, seasonality, holidays, changepoints, and autoregression, and lets you add your own.
- Fits the forecast using a machine learning model of your choice.
Intuitive interface
- Provides powerful plotting tools to explore seasonality, interactions, changepoints, etc.
- Provides model templates (default parameters) that work well based on data characteristics and forecast requirements (e.g. daily long-term forecast).
- Produces interpretable output, with model summary to examine individual regressors, and component plots to visually inspect the combined effect of related regressors.
Fast training and scoring
- Facilitates interactive prototyping, grid search, and benchmarking. Grid search is useful for model selection and semi-automatic forecasting of multiple metrics.
Extensible framework
- Exposes multiple forecast algorithms in the same interface, making it easy to try algorithms from different libraries and compare results.
- The same pipeline provides preprocessing, cross-validation, backtest, forecast, and evaluation with any algorithm.

Algorithms currently supported within Greykite’s modeling framework:

Silverkite (Greykite’s flagship algorithm)
Facebook Prophet

Notable Components

Greykite offers components that could be used within other forecasting libraries or even outside the forecasting context.

ModelSummary() - R-like summaries of scikit-learn and statsmodels regression models.
ChangepointDetector() - changepoint detection based on adaptive lasso, with visualization.
SimpleSilverkiteForecast() - Silverkite algorithm with forecast_simple and predict methods.
SilverkiteForecast() - low-level interface to Silverkite algorithm with forecast and predict methods.

Usage Examples

You can obtain forecasts with only a few lines of code:

from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.autogen.forecast_config import MetadataParam
from greykite.framework.templates.forecaster import Forecaster
from greykite.framework.templates.model_templates import ModelTemplateEnum

# df = ...  # your input timeseries!
metadata = MetadataParam(
    time_col="ts",     # time column in `df`
    value_col="y"      # value in `df`
)
forecaster = Forecaster()  # creates forecasts and stores the result
forecaster.run_forecast_config(
     df=df,
     config=ForecastConfig(
         # uses the SILVERKITE model template parameters
         model_template=ModelTemplateEnum.SILVERKITE.name,
         forecast_horizon=365,  # forecasts 365 steps ahead
         coverage=0.95,         # 95% prediction intervals
         metadata_param=metadata
     )
 )
# Access the result
forecaster.forecast_result
# ...

For a demo, please see our quickstart.

Setup and Installation

Greykite is available on Pypi and can be installed with pip:

pip install greykite

For more installation tips, see installation.

Documentation

Please find our full documentation here.

Learn More

Citation

Please cite Greykite in your publications if it helps your research:

@misc{reza2021greykite-github,
  author = {Reza Hosseini and
            Albert Chen and
            Kaixu Yang and
            Sayan Patra and
            Rachit Arora},
  title  = {Greykite: a flexible, intuitive and fast forecasting library},
  url    = {https://github.com/linkedin/greykite},
  year   = {2021}
}

License

Issues

Why pin runtime dependencies so tightly?

Hi,

Looking at the setup.py file, it looks like the following are all required runtime dependencies, all of which need to be pinned very precisely:

requirements = [    "Cython==0.29.23",    "cvxpy==1.1.12",    "fbprophet==0.5",    "holidays==0.9.10",  # 0.10.2,    "ipykernel==4.8.2",    "ipython==7.1.1",    "ipywidgets==7.2.1",    "jupyter==1.0.0",    "jupyter-client==6.1.5",    "jupyter-console==6.",  # used version 6 to avoid conflict with ipython version    "jupyter-core==4.7.1",    "matplotlib==3.4.1",    "nbformat==5.1.3",    "notebook==5.4.1",    "numpy==1.20.2",    "osqp==0.6.1",    "overrides==2.8.0",    "pandas==1.1.3",    "patsy==0.5.1",    "Pillow==8.0.1",    "plotly==3.10.0",    "pystan==2.18.0.0",    "pyzmq==22.0.3",    "scipy==1.5.4",    "seaborn==0.9.0",    "six==1.15.0",    "scikit-learn==0.24.1",    "Sphinx==3.2.1",    "sphinx-gallery==0.6.1",    "sphinx-rtd-theme==0.4.2",    "statsmodels==0.12.2",    "testfixtures==6.14.2",    "tornado==5.1.1",    "tqdm==4.52.0"]

My question is - why pin them so tightly, and are all of them really necessary? E.g. do I really need sphinx-gallery? Such tight pins make it very difficult to integrate into any existing project. Why not just require a lower bound for many/most of these?

opened by MarcoGorelli 15

Greykite suitable for pure linear increasing series?

Hello

I'm working in some house price time series using Greykite but for some reason, the forecast I got is just a median price between upper and lower (ARIMA), so is this known issue with Greykite when we have a pure linear increasing series?

Thank you Aktham Momani

opened by akthammomani 6

Seasonality changepoint detection does not seem to work with cross-validation for Silverkite

Hi,

First of all thank you for open-sourcing this library. It's really complete and well though (as well as the Silverkite algorithm itself).

However, I think I have spotted a potential bug:

It seems that the option seasonality_changepoints_dict in ModelComponentsParam does seem to break some functionality within pandas, when running Silverkite with cross-validation.

Here's a complete example (using Greykite 0.2.0):

import pandas as pd
import numpy as np

# Load airline passengers dataset (with monthly data):
air_passengers = pd.read_csv("https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv")
air_passengers["Month"] = pd.to_datetime(air_passengers["Month"])
air_passengers = air_passengers.set_index("Month").asfreq("MS").reset_index()

# Prepare Greykite configs:
from greykite.framework.templates.autogen.forecast_config import (ComputationParam, 
                                                                  EvaluationMetricParam, 
                                                                  EvaluationPeriodParam,
                                                                  ForecastConfig, 
                                                                  MetadataParam, 
                                                                  ModelComponentsParam)

# Metadata:
metadata_params = MetadataParam(date_format=None,  # infer
                                freq="MS",
                                time_col="Month",
                                train_end_date=None,
                                value_col="Passengers")

# Eval metric:
evaluation_metric_params = EvaluationMetricParam(agg_func=np.sum,   # Sum all forecasts...
                                                 agg_periods=12,    # ...Over 12 months
                                                 cv_report_metrics=["MeanSquaredError", "MeanAbsoluteError", "MeanAbsolutePercentError"],
                                                 cv_selection_metric="MeanAbsolutePercentError",
                                                 null_model_params=None,
                                                 relative_error_tolerance=None)

# Eval procedure (CV & backtest):
evaluation_period_params = EvaluationPeriodParam(cv_expanding_window=False,
                                                 cv_horizon=0,   # No CV for now. CHANGE THIS
                                                 cv_max_splits=5,
                                                 cv_min_train_periods=24,
                                                 cv_periods_between_splits=6,
                                                 cv_periods_between_train_test=0,
                                                 cv_use_most_recent_splits=False,
                                                 periods_between_train_test=0,
                                                 test_horizon=12)

# Config for seasonality changepoints
seasonality_components_df = pd.DataFrame({"name": ["conti_year"],
                                          "period": [1.0],
                                          "order": [5],
                                          "seas_names": ["yearly"]})

# Model components (quite long):
model_components_params = ModelComponentsParam(autoregression={"autoreg_dict": "auto"},
                                               
                                               changepoints={"changepoints_dict":  [{"method":"auto",
                                                                                     "potential_changepoint_n": 50,
                                                                                     "no_changepoint_proportion_from_end": 0.2,
                                                                                     "regularization_strength": 0.01}],
                                                             
                                                             # Seasonality changepoints
                                                             "seasonality_changepoints_dict": [{"regularization_strength": 0.6,
                                                                                                "no_changepoint_proportion_from_end": 0.8,
                                                                                                "seasonality_components_df": seasonality_components_df,
                                                                                                "potential_changepoint_n": 50,
                                                                                                "resample_freq":"MS"},
                                                                                               ]
                                                            },
                                               
                                               custom={"fit_algorithm_dict": [{"fit_algorithm": "linear"},
                                                                              ],
                                                       "feature_sets_enabled": "auto",
                                                       "min_admissible_value": 0.0},
                                               
                                               events={"holiday_lookup_countries": None,
                                                       "holidays_to_model_separately": None,
                                                       },
                                               
                                               growth={"growth_term":["linear"]},
                                               
                                               hyperparameter_override={"input__response__outlier__z_cutoff": [100.0],
                                                                        "input__response__null__impute_algorithm": ["ts_interpolate"]},
                                               
                                               regressors=None,
                                               
                                               lagged_regressors=None,
                                               
                                               seasonality={"yearly_seasonality": [5],
                                                            "quarterly_seasonality": ["auto"],
                                                            "monthly_seasonality": False,
                                                            "weekly_seasonality": False,
                                                            "daily_seasonality": False},
                                               
                                               uncertainty=None)

# Computation
computation_params = ComputationParam(n_jobs=1,
                                      verbose=3)


# Define forecaster:
from greykite.framework.templates.forecaster import Forecaster

# defines forecast configuration
config=ForecastConfig(model_template="SILVERKITE",
                      forecast_horizon=12,
                      coverage=0.8,
                      metadata_param=metadata_params,
                      evaluation_metric_param=evaluation_metric_params,
                      evaluation_period_param=evaluation_period_params,
                      model_components_param=model_components_params,
                      computation_param=computation_params,
                     )

# Run:
# creates forecast
forecaster = Forecaster()
result = forecaster.run_forecast_config(df=air_passengers, 
                                        config=config 
                                        )

If we run the piece of code above, everything works as expected. However, if we activate cross-validation (increasing cv_horizon to 5 for instance), Greykite crashes. This happens unless we remove seasonality changepoints (through removing seasonality_changepoints_dict).

The crash traceback looks as follows:

5 fits failed out of a total of 5.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
5 fits failed with the following error:
Traceback (most recent call last):
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\model_selection\_validation.py", line 681, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\pipeline.py", line 394, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\sklearn\estimator\simple_silverkite_estimator.py", line 239, in fit
    self.model_dict = self.silverkite.forecast_simple(
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\forecast\silverkite\forecast_simple_silverkite.py", line 708, in forecast_simple
    trained_model = super().forecast(**parameters)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\forecast\silverkite\forecast_silverkite.py", line 719, in forecast
    seasonality_changepoint_result = get_seasonality_changepoints(
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\changepoint\adalasso\changepoint_detector.py", line 1177, in get_seasonality_changepoints
    result = cd.find_seasonality_changepoints(**seasonality_changepoint_detection_args)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\common\python_utils.py", line 787, in fn_ignore
    return fn(*args, **kwargs)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\changepoint\adalasso\changepoint_detector.py", line 736, in find_seasonality_changepoints
    seasonality_df = build_seasonality_feature_df_with_changes(
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\changepoint\adalasso\changepoints_utils.py", line 237, in build_seasonality_feature_df_with_changes
    fs_truncated_df.loc[(features_df["datetime"] < date).values, cols] = 0
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexing.py", line 719, in __setitem__
    indexer = self._get_setitem_indexer(key)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexing.py", line 646, in _get_setitem_indexer
    self._ensure_listlike_indexer(key)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexing.py", line 709, in _ensure_listlike_indexer
    self.obj._mgr = self.obj._mgr.reindex_axis(
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\internals\base.py", line 89, in reindex_axis
    return self.reindex_indexer(
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\internals\managers.py", line 670, in reindex_indexer
    self.axes[axis]._validate_can_reindex(indexer)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexes\base.py", line 3785, in _validate_can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis


C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning:

One or more of the test scores are non-finite: [nan]

C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning:

One or more of the train scores are non-finite: [nan]

It would be great to cross-validate when seasonality changepoint is activated, as it allows to learn multiplicative seasonalities for instance in a similar fashion as Prophet or Orbit do.

Thank you!

opened by julioasotodv 6

"cv_selection_metric" & "cv_report_metrics"

Hello all,

I am running the Greykite without cross-validation (cv_max_splits = 0) because I am using the LassoCV() algorithm which by itself uses 5-fold CV. The ForecastConfig() is as follows, in particular, evaluation_metric is all set to None because cv_max_splits = 0:

However, the output on the console suggests that at least 3 metrics are evaluated. My response contains zeros so I do not want MAPE and MedAPE to be reported, and I do not want "Correlation" to be reported either. As a matter of fact, since the loss function in LassoCV() is MSE (L2-norm), I am not interested in anything rather than MSE, really. Unless the loss function in LassoCV() could be changed to MAE (L1-norm) in that case I would be interested in the MAE instead of MSE:

Do you have any suggestions please ?

Best regards, Dario

opened by dromare 6
Setting of "cv_max_splits" when using "fit_algorithm": "lasso"

Hi all,

When setting fit_algorithm_params={"cv": 5} to use 5-fold CV with sklearn LassoCV() on the training set, how should the global parameter "cv_max_splits" be set up ? (either set it to zero, or to None - equivalent to 3 - or equal to 5 ?).

Best regards, Dario

opened by dromare 4

Can't save model

After fitting model I would like to persist it for later use in my app. I tried to save the model (result.model), the forecaster, the forecaster and forecaster.forecast_result and none of them could be persisted using pickle or joblib.

That's the error I get. Any advice?

---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
<ipython-input-77-0716155adc48> in <module>
----> 1 joblib.dump(result.model, model_path)

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in dump(value, filename, compress, protocol, cache_size)
    478     elif is_filename:
    479         with open(filename, 'wb') as f:
--> 480             NumpyPickler(f, protocol=protocol).dump(value)
    481     else:
    482         NumpyPickler(filename, protocol=protocol).dump(value)

/work/y435/crypto-forecast/lib/python3.7/pickle.py in dump(self, obj)
    435         if self.proto >= 4:
    436             self.framer.start_framing()
--> 437         self.save(obj)
    438         self.write(STOP)
    439         self.framer.end_framing()

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    547 
    548         # Save the reduce() output and finally memoize the object
--> 549         self.save_reduce(obj=obj, *rv)
    550 
    551     def persistent_id(self, obj):

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
    660 
    661         if state is not None:
--> 662             save(state)
    663             write(BUILD)
    664 

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_dict(self, obj)
    857 
    858         self.memoize(obj)
--> 859         self._batch_setitems(obj.items())
    860 
    861     dispatch[dict] = save_dict

/work/y435/crypto-forecast/lib/python3.7/pickle.py in _batch_setitems(self, items)
    883                 for k, v in tmp:
    884                     save(k)
--> 885                     save(v)
    886                 write(SETITEMS)
    887             elif n:

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_list(self, obj)
    817 
    818         self.memoize(obj)
--> 819         self._batch_appends(obj)
    820 
    821     dispatch[list] = save_list

/work/y435/crypto-forecast/lib/python3.7/pickle.py in _batch_appends(self, items)
    841                 write(MARK)
    842                 for x in tmp:
--> 843                     save(x)
    844                 write(APPENDS)
    845             elif n:

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_tuple(self, obj)
    772         if n <= 3 and self.proto >= 2:
    773             for element in obj:
--> 774                 save(element)
    775             # Subtle.  Same as in the big comment below.
    776             if id(obj) in memo:

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    547 
    548         # Save the reduce() output and finally memoize the object
--> 549         self.save_reduce(obj=obj, *rv)
    550 
    551     def persistent_id(self, obj):

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
    660 
    661         if state is not None:
--> 662             save(state)
    663             write(BUILD)
    664 

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_dict(self, obj)
    857 
    858         self.memoize(obj)
--> 859         self._batch_setitems(obj.items())
    860 
    861     dispatch[dict] = save_dict

/work/y435/crypto-forecast/lib/python3.7/pickle.py in _batch_setitems(self, items)
    883                 for k, v in tmp:
    884                     save(k)
--> 885                     save(v)
    886                 write(SETITEMS)
    887             elif n:

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_global(self, obj, name)
    958             raise PicklingError(
    959                 "Can't pickle %r: it's not found as %s.%s" %
--> 960                 (obj, module_name, name)) from None
    961         else:
    962             if obj2 is not obj:

PicklingError: Can't pickle <function add_finite_filter_to_scorer.<locals>.score_func_finite at 0x7f490e750d40>: it's not found as greykite.common.evaluation.add_finite_filter_to_scorer.<locals>.score_func_finite

opened by fpcarneiro 4

TimeSeries features

Hi all,

Great library and work! I was curious if there is a recommended way to get the time series features as a dataframe without running the model? I am looking to compare with other models.

Thanks, George

opened by ggerogiokas 4
Training the model on all data

Hello,

First of all, thanks for this library!

I want to train the model on all of my data, then create a future dataframe and let the model forecast those timesteps. This is to simulate a real-world situation where you actually want to predict the future, in which you don't have any data to validate on.

The last timestamp in my dataset is 2020-02-20 09:00:00. So I set the train_end_date to this timestamp in MetadataParam like this:

metadata = MetadataParam( time_col="ts", value_col="y",
freq="H", train_end_date=datetime.datetime(2020, 2, 20, 9) )

Then, in forecaster.forecast_config, I tried commenting out forecast horizon, which needs to be >= 1.

forecaster = Forecaster() # Creates forecasts and stores the result result = forecaster.run_forecast_config( df=df_model, config=ForecastConfig( model_template=ModelTemplateEnum.SILVERKITE.name, # model template #forecast_horizon=1, coverage=0.95, # 95% prediction intervals metadata_param=metadata, model_components_param=model_components, evaluation_period_param=evaluation_period ) )

Running this I get the Error message: ValueError: fut_df must be a dataframe of non-zero size.

So the closest I have come to achieve what I want is to set train_end_date=datetime.datetime(2020, 2, 20, 8), an hour before the last timestamp in the dataset, and use forecast_horizon=1. However, I still want the model to train on this last hour, since I intend to run a short-term forecast.

So, the question I have is; how do I train the model on all of my data, without forecasting on it before I give the model a future dataframe?

opened by Sigvesor 4

Getting Various Warnings while running time series prediction

I'm trying to fit GreyKite Model to my time series data.
I have attached the csv file for reference.
Even though the model works, it raises a bunch of warnings that I'd like to avoid.
Since some of my target values are zero it tells me that MAPE is undefined.
Also, since I'm only forecasting one step into the future, it gives me an UndefinedMetricWarning : R^2 score is not well-defined with less than two samples.'
I have attached a few images displaying the warnings.
Any help to get rid of these warnings would be appreciated!
This is the code I'm using to fit the data:

`class GreyKiteModel(AnomalyModel):

def __init__(self, *args,model_kwargs = {}, **kwargs) -> None:
    super().__init__(*args, **kwargs)
    self.model_kwargs = model_kwargs
    
def predict(self, df: pd.DataFrame, ) -> pd.DataFrame:
    """Takes in pd.DataFrame with 2 columns, dt and y, and returns a 
    pd.DataFrame with 3 columns, dt, y, and yhat_lower, yhat_upper.

    :param df: Input Dataframe with dt, y columns
    :type df: pd.DataFrame
    :return: Output Dataframe with dt, y, yhat_lower, yhat_upper 
    columns
    :rtype: pd.DataFrame
    """
    df = df.rename(columns = {"dt":"ds", "y":"y"})
    metadata = MetadataParam(time_col="ds", # ----> name of the time column 
                             value_col="y", # ----> name of the value column 
                             freq="D"       # ----> H" for hourly, "D" for daily, "W" for weekly, etc. 
                            )
    forecaster = Forecaster()  # Creates forecasts and stores the result
    result = forecaster.run_forecast_config(df=df, # result is also stored as forecaster.forecast_result
                                            config=ForecastConfig(model_template=ModelTemplateEnum.SILVERKITE.name,
                                                                  forecast_horizon=1,  # forecasts 1 step
                                                                  coverage=0.95,
                                                                  metadata_param=metadata 
                                                                  )
                                            )
    forecast_df = result.forecast.df
    forecast_df = forecast_df.drop(columns=['actual'])
    forecast_df.rename(columns={'ds':'dt',
                                'forecast':'y', 
                                'forecast_lower':'yhat_lower', 
                                'forecast_upper':'yhat_upper' },
                       inplace=True)
    return forecast_df`

df.csv

Screenshot from 2021-08-21 12-39-55

Screenshot from 2021-08-21 12-39-10

opened by Amatullah 4

Predictions taking too long

Hi Greykite Team!

I am trying to use Greykite to predict at scale and I am not sure if I am doing something wrong but even with the example code, the predictions take a long time to calculate. Sometime in the 20, 30, 40 seconds and others in the minutes. Any help will be greatly appreciated. Below is a sample code I am running that takes about 17 or so seconds.

from greykite.framework.templates.autogen.forecast_config import ForecastConfig from greykite.framework.templates.autogen.forecast_config import MetadataParam from greykite.framework.templates.forecaster import Forecaster from greykite.framework.templates.model_templates import ModelTemplateEnum import numpy as np import pandas as pd np.random.seed(1)

rows,cols = 365,1 data = np.random.rand(rows,cols) tidx = pd.date_range('2019-01-01', periods=rows, freq='MS') data_frame = pd.DataFrame(data, columns=['y'], index=tidx) data_frame = data_frame.reset_index() data_frame.columns = ['ts', 'y']

metadata = MetadataParam( time_col="ts", # time column in df value_col="y" # value in df ) forecaster = Forecaster() # creates forecasts and stores the result forecaster.run_forecast_config( df=data_frame, config=ForecastConfig( # uses the SILVERKITE model template parameters model_template=ModelTemplateEnum.SILVERKITE.name, forecast_horizon=365, # forecasts 365 steps ahead coverage=0.95, # 95% prediction intervals metadata_param=metadata ) )

forecaster.forecast_result

opened by lrosariov 4
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
Hi!

I was trying to impelment the sample tutorial for my dataset. The basic statistics of the dataset looks like: , which is excatly the input format as required.

when I run the following code:

result = forecaster.run_forecast_config( # result is also stored as `forecaster.forecast_result`. df=train, config=ForecastConfig( model_template=ModelTemplateEnum.SILVERKITE.name, forecast_horizon=5, # forecasts 365 steps ahead coverage=0.95, # 95% prediction intervals metadata_param=metadata ) )

the output error shows:

File "/home/cxie/anaconda3/envs/greykite/lib/python3.8/site-packages/greykite/framework/templates/forecaster.py", line 326, in run_forecast_config self.forecast_result = forecast_pipeline(**pipeline_parameters) File "/home/cxie/anaconda3/envs/greykite/lib/python3.8/site-packages/greykite/framework/pipeline/pipeline.py", line 210, in pipeline_wrapper return pipeline_function( File "/home/cxie/anaconda3/envs/greykite/lib/python3.8/site-packages/greykite/framework/pipeline/pipeline.py", line 711, in forecast_pipeline grid_search.fit(train_df, train_y) File "/home/cxie/anaconda3/envs/greykite/lib/python3.8/site-packages/sklearn/model_selection/search.py", line 926, in fit self.best_estimator.fit(X, y, **fit_params) File "/home/cxie/anaconda3/envs/greykite/lib/python3.8/site-packages/sklearn/pipeline.py", line 394, in fit self._final_estimator.fit(Xt, y, **fit_params_last_step) File "/home/cxie/anaconda3/envs/greykite/lib/python3.8/site-packages/greykite/sklearn/estimator/simple_silverkite_estimator.py", line 239, in fit self.model_dict = self.silverkite.forecast_simple( File "/home/cxie/anaconda3/envs/greykite/lib/python3.8/site-packages/greykite/algo/forecast/silverkite/forecast_simple_silverkite.py", line 708, in forecast_simple trained_model = super().forecast(**parameters) File "/home/cxie/anaconda3/envs/greykite/lib/python3.8/site-packages/greykite/algo/forecast/silverkite/forecast_silverkite.py", line 861, in forecast trained_model = fit_ml_model_with_evaluation( File "/home/cxie/anaconda3/envs/greykite/lib/python3.8/site-packages/greykite/algo/common/ml_models.py", line 785, in fit_ml_model_with_evaluation training_evaluation[R2_null_model_score] = r2_null_model_score( File "/home/cxie/anaconda3/envs/greykite/lib/python3.8/site-packages/greykite/common/evaluation.py", line 251, in r2_null_model_score y_true, y_pred, y_train, y_pred_null = valid_elements_for_evaluation(y_true, y_pred, y_train, y_pred_null) File "/home/cxie/anaconda3/envs/greykite/lib/python3.8/site-packages/greykite/common/evaluation.py", line 93, in valid_elements_for_evaluation return [reference_array[keep]] + [np.array(array)[keep] if (array is not None File "/home/cxie/anaconda3/envs/greykite/lib/python3.8/site-packages/greykite/common/evaluation.py", line 93, in return [reference_array[keep]] + [np.array(array)[keep] if (array is not None IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

I tried to compare the differences with the sample dataframe:

dl = DataLoader() df = dl.load_peyton_manning()

and the types looks pretty much the same.

Any idea why this error shows up?
opened by victorxie996 1
Extract components from forecast

Hi, I was wondering if it is possible to extract the different modeling components (e.g. trend, holidays, seasonalities) from the forecasted time series. It's possible to do this in the Prophet framework, see: https://github.com/facebook/prophet/issues/1920

The reason is that I would like to use a custom trend component calculated outside of Greykite.

opened by Jonathan-MW 1
add code to be make dumps/loads compatible with Google Cloud Storage

gcp_utils.py: formatted similarly to pickle_utils.py, includes GCP Functionalities. forcaster.py: added a variable to dump_obj and load_obj to include bucket name which is needed for GCP. requirements.txt: updated to include google libraries.

opened by kenzie-q 0
Inclusion of custom events, into the model itself with anomaly/holiday score for interpolation.

Hey, I am working on a use case that requires a lot of event based forecasting.

Event may range from a technical updation to that of the release of new campaign, but the dates are generally known to business beforehand. Now these events does not follow any calendar rules but data has seen such type of events in the past, so putting those holidays/events in the model itself with custom future date/strength (as all campaigns or event of same genre will have the same type of impact) make the prediction really smooth as no extra added steps are to be done after prediction to tune out the holidays, all will be taken care of in the predict function.

I had already raised an issue in the prophet community here which is a detailed discussion along with required code snippets. Please let me know if you are thinking to incorporate such changes. This will be really handy for event driven forecasting.

opened by beginner1729 0
Prophet Logistic Growth
Thanks for this library. This really makes my workflow a lot easier !!

I am trying to fit a 'logistic' growth model using the Prophet and I'm passing the dataframe containing 'cap' and 'floor' columns to the metadata param. But while fitting the model, Prophet throws an error saying that for a logistic model, it expectcs 'cap' and 'floor' columns. How do i specify which columns in the dataframe should be used for 'cap' and 'floor' ?

When I looked at the code for ProphetEstimator fit function, only the 'time' column and 'y' column gets passed on to Prophet code, no additional columns are passed on.

def fit(self, X, y=None, time_col=TIME_COL, value_col=VALUE_COL, **fit_params): super().fit(X, y=y, time_col=time_col, value_col=value_col, **fit_params) if self.add_regressor_dict is None: fit_columns = [time_col, value_col] else: reg_cols = list(self.add_regressor_dict.keys()) fit_columns = [time_col, value_col] + reg_cols

Is this a bug or is there a way to pass the 'cap' and 'floor' columns that I'm missing ? I couldn't find an example on how to do this in the documentation.

Thanks !!
opened by venk2k16 8
TerminatedWorkerError while running benchmarking

Hi,

When I try to run benchmarking for Silverkite and Prophet( both with my data and the example data in the notebook provided with Greykite), I get the following error: TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

When I set n_jobs=1, the kernel crashes, whereas for any other value of n_jobs , the above error shows up. OS: windows 10 RAM : 16GB Processor: i7

Any suggestions/pointers?

Regards

opened by sacmax 6

Releases(v0.1.1)

v0.1.1(Dec 22, 2021)

PyPI: https://pypi.org/project/greykite/0.1.1/ Authors: @Reza1317 , @al-bert , @KaixuYang , @sayanpatra Other contributors: @Saadorj, @rachitb1

Blog post for this release: https://engineering.linkedin.com/blog/2021/greykite--a-flexible--intuitive--and-fast-forecasting-library Paper for this release: https://arxiv.org/abs/2105.01098
Source code(tar.gz)
Source code(zip)
v0.2.0(Dec 22, 2021)

PyPI: https://pypi.org/project/greykite/0.2.0/ Release notes: https://github.com/linkedin/greykite/blob/master/HISTORY.rst Contributors: @KaixuYang , @sayanpatra , @Reza1317 , @Saadorj
Source code(tar.gz)
Source code(zip)
v0.3.0(Dec 22, 2021)

PyPI: https://pypi.org/project/greykite/0.3.0/ Release notes: https://github.com/linkedin/greykite/blob/master/HISTORY.rst Contributors: @KaixuYang , @njusu , @Reza1317 , @al-bert , @sayanpatra , @Saadorj , @dromare , @martinmenchon
Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository

Simple and flexible ML workflow engine.

This is a simple and flexible ML workflow engine. It helps to orchestrate events across a set of microservices and create executable flow to handle requests. Engine is designed to be configurable wit

295 Jan 06, 2023

Stacked Generalization (Ensemble Learning)

Stacking (stacked generalization) Overview ikki407/stacking - Simple and useful stacking library, written in Python. User can use models of scikit-lea

192 Dec 23, 2022

A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

Stox A Module to predict the "close price" for the next day and give "technical analysis". It uses a Neural Network and the LSTM algorithm to predict

31 Dec 16, 2022

Kalman filter library

The kalman filter framework described here is an incredibly powerful tool for any optimization problem, but particularly for visual odometry, sensor fusion localization or SLAM.

276 Jan 01, 2023

Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

pandas-method-chaining pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code. It is a fork from pandas-v

5 May 14, 2022

Timeseries analysis for neuroscience data

=================================================== Nitime: timeseries analysis for neuroscience data ===============================================

212 Dec 09, 2022

A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

802 Jan 01, 2023

Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

GENDIS GENetic DIscovery of Shapelets In the time series classification domain, shapelets are small subseries that are discriminative for a certain cl

90 Oct 28, 2022

It is a forest of random projection trees

rpforest rpforest is a Python library for approximate nearest neighbours search: finding points in a high-dimensional space that are close to a given

211 Dec 29, 2022

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

6.9k Jan 05, 2023

﻿Greykite: A flexible, intuitive and fast forecasting library

Related tags

Overview

Why Greykite?

Distinguishing Features

Notable Components

Usage Examples

Setup and Installation

Documentation

Learn More

Citation

License

Issues

Releases(v0.1.1)

v0.1.1(Dec 22, 2021)

v0.2.0(Dec 22, 2021)

v0.3.0(Dec 22, 2021)

Owner

LinkedIn

Simple and flexible ML workflow engine.

Stacked Generalization (Ensemble Learning)

A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

Kalman filter library

Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

Timeseries analysis for neuroscience data

A scikit-learn based module for multi-label et. al. classification

Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

It is a forest of random projection trees

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

决策树分类与回归模型的实现和可视化

Python-based implementations of algorithms for learning on imbalanced data.

Open source time series library for Python

Tutorial for Decision Threshold In Machine Learning.

Iterative stochastic gradient descent (SGD) linear regressor with regularization

🔬 A curated list of awesome machine learning strategies & tools in financial market.

Machine Learning toolbox for Humans

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

using Machine Learning Algorithm to classification AppleStore application

A repository to index and organize the latest machine learning courses found on YouTube.

Greykite: A flexible, intuitive and fast forecasting library