Probabilistic time series modeling in Python

Overview

GluonTS - Probabilistic Time Series Modeling in Python

PyPI GitHub Static Static

GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (incubating).

GluonTS provides utilities for loading and iterating over time series datasets, state of the art models ready to be trained, and building blocks to define your own models and quickly experiment with different solutions.

Installation

GluonTS requires Python 3.6, and the easiest way to install it is via pip:

pip install --upgrade mxnet~=1.7 gluonts

Dockerfiles

Dockerfiles compatible with Amazon Sagemaker can be found in the examples/dockerfiles folder.

Quick start guide

This simple example illustrates how to train a model from GluonTS on some data, and then use it to make predictions. As a first step, we need to collect some data: in this example we will use the volume of tweets mentioning the AMZN ticker symbol.

import pandas as pd
url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
df = pd.read_csv(url, header=0, index_col=0)

The first 100 data points look like follows:

import matplotlib.pyplot as plt
df[:100].plot(linewidth=2)
plt.grid(which='both')
plt.show()

Data

We can now prepare a training dataset for our model to train on. Datasets in GluonTS are essentially iterable collections of dictionaries: each dictionary represents a time series with possibly associated features. For this example, we only have one entry, specified by the "start" field which is the timestamp of the first datapoint, and the "target" field containing time series data. For training, we will use data up to midnight on April 5th, 2015.

from gluonts.dataset.common import ListDataset
training_data = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-05 00:00:00"]}],
    freq = "5min"
)

A forecasting model in GluonTS is a predictor object. One way of obtaining predictors is by training a correspondent estimator. Instantiating an estimator requires specifying the frequency of the time series that it will handle, as well as the number of time steps to predict. In our example we're using 5 minutes data, so freq="5min", and we will train a model to predict the next hour, so prediction_length=12. We also specify some minimal training options.

from gluonts.model.deepar import DeepAREstimator
from gluonts.mx.trainer import Trainer

estimator = DeepAREstimator(freq="5min", prediction_length=12, trainer=Trainer(epochs=10))
predictor = estimator.train(training_data=training_data)

During training, useful information about the progress will be displayed. To get a full overview of the available options, please refer to the documentation of DeepAREstimator (or other estimators) and Trainer.

We're now ready to make predictions: we will forecast the hour following the midnight on April 15th, 2015.

test_data = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-15 00:00:00"]}],
    freq = "5min"
)

from gluonts.dataset.util import to_pandas

for test_entry, forecast in zip(test_data, predictor.predict(test_data)):
    to_pandas(test_entry)[-60:].plot(linewidth=2)
    forecast.plot(color='g', prediction_intervals=[50.0, 90.0])
plt.grid(which='both')

Forecast

Note that the forecast is displayed in terms of a probability distribution: the shaded areas represent the 50% and 90% prediction intervals, respectively, centered around the median (dark green line).

Further examples

The following are good entry-points to understand how to use many features of GluonTS:

The following modules illustrate how custom models can be implemented:

Contributing

If you wish to contribute to the project, please refer to our contribution guidelines.

Citing

If you use GluonTS in a scientific publication, we encourage you to add the following references to the related papers:

@article{gluonts_jmlr,
  author  = {Alexander Alexandrov and Konstantinos Benidis and Michael Bohlke-Schneider
    and Valentin Flunkert and Jan Gasthaus and Tim Januschowski and Danielle C. Maddix
    and Syama Rangapuram and David Salinas and Jasper Schulz and Lorenzo Stella and
    Ali Caner Türkmen and Yuyang Wang},
  title   = {{GluonTS: Probabilistic and Neural Time Series Modeling in Python}},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {116},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v21/19-820.html}
}
@article{gluonts_arxiv,
  author  = {Alexandrov, A. and Benidis, K. and Bohlke-Schneider, M. and
    Flunkert, V. and Gasthaus, J. and Januschowski, T. and Maddix, D. C.
    and Rangapuram, S. and Salinas, D. and Schulz, J. and Stella, L. and
    Türkmen, A. C. and Wang, Y.},
  title   = {{GluonTS: Probabilistic Time Series Modeling in Python}},
  journal = {arXiv preprint arXiv:1906.05264},
  year    = {2019}
}

Video

Further Reading

Overview tutorials

Introductory material

Comments
  • When I use the #898 code, there are some problems

    When I use the #898 code, there are some problems

    Description

    In the past, when I used gluon-ts-0.5.0, num_workers>1 could not be set. I saw the code lostella released in #898 the other day and pulled it down for use. Num_workers can be set to greater than 1, but there are some problems. Problem 1: when epoch reaches about 200, the training will end. Problem 2: CUDA initialization error occurs when training the next target.

    I made a demo to reproduce the problem, and some of the data will be uploaded. Thanks! @lostella

    data.zip

    To Reproduce

    #!/usr/bin/env python3
    # -*- coding:utf-8 -*-
    import os
    import mxnet as mx
    import numpy as np
    import pandas as pd
    from gluonts.dataset import common
    from gluonts.evaluation import Evaluator
    from gluonts.evaluation.backtest import make_evaluation_predictions
    from gluonts.model import deepar
    from gluonts.trainer import Trainer
    
    
    
    def model_train(df):
        param_list = {
                'epochs': [10000],
                'num_layers': [4],
                'learning_rate': [1e-2],
                'mini_batch_size': [32],
                'num_cells': [40],
                'cell_type': ['lstm'],
            }
        prediction_length = 12
        freq = '2H'
        re_day = 7
        train_time = df.iloc[-1 + (-24) * re_day].monitor_time
        end_time = df.iloc[-1].monitor_time
        test_time = pd.date_range(start=train_time, end=end_time, freq='H')[1:]
        df = df.set_index('monitor_time')
        model_i = 0
        a = []
        for i, _ in enumerate(test_time):
            a.append({"start": df.index[0], "target": df.Measured[:str(test_time[i])]})
        data = common.ListDataset([{"start": df.index[0],
                                        "target": df.Measured[:train_time]}],
                                      freq=freq)
    
        val_data = common.ListDataset(a, freq=freq)
        for epochs_i in param_list['epochs']:
            for batch_i in param_list['mini_batch_size']:
                for lr_i in param_list['learning_rate']:
                    for cells_i in param_list['num_cells']:
                        for layers_i in param_list['num_layers']:
                            for type_i in param_list['cell_type']:
                                estimator = deepar.DeepAREstimator(
                                    prediction_length=prediction_length,
                                    context_length=prediction_length,
                                    freq=freq,
                                    num_layers=layers_i,
                                    num_cells=cells_i,
                                    cell_type=type_i,
    
                                    trainer=Trainer(
                                        ctx=mx.gpu(),
                                        epochs=epochs_i,
                                        learning_rate=lr_i,
                                        hybridize=True,
                                        batch_size=batch_i,
                                    ),
                                )
    
                                predictor = estimator.train(training_data=data, num_workers=2, num_prefetch=96)
                                forecast_it, ts_it = make_evaluation_predictions(val_data, predictor=predictor,
                                                                                 num_samples=100)
                                forecasts = list(forecast_it)
                                tss = list(ts_it)
    
                                evaluator = Evaluator(quantiles=[0.5], seasonality=2016)
                                agg_metrics, item_metrics = evaluator(iter(tss), iter(forecasts), num_series=len(val_data))
    
                                if model_i == 0:
                                    df_metrics = pd.DataFrame(columns=list(agg_metrics))
    
                                values_metrics = []
                                for k in agg_metrics:
                                    values_metrics.append(agg_metrics[k])
    
                                df_metrics.loc[model_i, :] = values_metrics
                                model_i = model_i + 1
    
    
        best_model_ind = np.argmin(df_metrics['RMSE'].values)
        print('The best model index is {}, mae {}, rmese {}'.format(
            best_model_ind, df_metrics.loc[best_model_ind, 'abs_error'] / prediction_length,
            df_metrics.loc[best_model_ind, 'RMSE']))
        return df_metrics, best_model_ind
    
    def file_name_get(item, spe_file):
        for root, dirs, files in os.walk(spe_file):
            file = []
            for i in files:
                if item in i:
                    file.append(i)
            return file
    
    
    if __name__=='__main__':
        data_file = 'data'
        files = file_name_get('data', data_file)
        for file in files:
            df = pd.read_csv(file)
            df_metrics, best_model_ind = model_train(df)
    
    

    Error message or code output

    100%|███| 50/50 [00:01<00:00, 31.63it/s, epoch=194/10000, avg_epoch_loss=-.0183]
    100%|███| 50/50 [00:01<00:00, 31.55it/s, epoch=195/10000, avg_epoch_loss=0.0884]
    WARNING:root:Serializing RepresentableBlockPredictor instances does not save the prediction network structure in a backwards-compatible manner. Be careful not to use this method in production.
    Running evaluation: 100%|████████████████████| 336/336 [00:02<00:00, 112.13it/s]
    
    Train process 0 ,Epochs 10000, Batch_size: 32, Learning_rate: 0.01, Num_cells: 40, Num_layers: 4, cell_type: lstm
    0%| | 0/50 [00:00<?, ?it/s][06:25:22] src/engine/threaded_engine_perdevice.cc:101: Ignore CUDA Error [06:25:22] /home/ubuntu/mxnet-distro/mxnet-build/3rdparty/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess: CUDA: initialization error
    Stack trace:
    [bt] (0) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6b8b5b) [0x7ff3de97fb5b]
    [bt] (1) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37ab842) [0x7ff3e1a72842]
    [bt] (2) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37ceece) [0x7ff3e1a95ece]
    [bt] (3) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37c19d1) [0x7ff3e1a889d1]
    [bt] (4) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37b74a1) [0x7ff3e1a7e4a1]
    [bt] (5) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37b83f4) [0x7ff3e1a7f3f4]
    [bt] (6) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::Chunk::~Chunk()+0x3c2) [0x7ff3e1cada42]
    [bt] (7) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6bc30a) [0x7ff3de98330a]
    [bt] (8) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(MXNDArrayFree+0x54) [0x7ff3e19e89c4]
    

    Environment

    • Operating system: Ubuntu 18.04.2
    • Python version: Python 3.7.4
    • GluonTS version: the code released in #898
    • MXNet version: mxnet-cu101 1.6.0
    • CPU cores: 14
    • GPU information: 3b:00.0 3D controller: NVIDIA Corporation GV100 [Tesla V100 PCIe] (rev a1) b1:00.0 3D controller: NVIDIA Corporation GV100 [Tesla V100 PCIe] (rev a1)
    bug 
    opened by k-user 22
  • Remove mandatory `freq` attribute of `Predictor`.

    Remove mandatory `freq` attribute of `Predictor`.

    Issue #, if available:

    Description of changes:

    Follow-up changes to #1997

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

    BREAKING 
    opened by kashif 20
  • Implemented model iteration averaging to reduce model variance

    Implemented model iteration averaging to reduce model variance

    Issue #, if available:

    Description of changes:

    1. In model_iteration_averaging.py, implemented model averaging across iterations during training instead of epochs after training
    2. Implemented 3 different averaging triggers: NTA (NTA_V1 is the ICLR version: https://openreview.net/pdf?id=SyyGPP0TZ, NTA_V2 is the arxiv version: https://arxiv.org/pdf/1708.02182.pdf), and Alpha Suffix (https://arxiv.org/pdf/1109.5647.pdf)
    3. Integrated both epoch averaging and iteration averaging in Trainer (mx/trainer/_base.py)
    4. Wrote test in test/trainer/test_model_iteration_averaging.py

    The overall goal is to reduce the model variance. We test iteration averaging on DeepAR anomaly detection (examples\anomaly_detection.py, electricity data) We train the model with 20 different random seeds, and report the variance on the same batch of target sequences (take variance on each timestamp, and then take the average over the entire sequence and all samples) The results are as follows: | | n or alpha | var | var/mean | std | std/mean | RMSE | |-----------------|--------------|---------|------------|---------|------------|---------| | SelectNBestMean | 1 | 9552.24 | 0.508395 | 22.5279 | 0.0318269 | 414.924 | | SelectNBestMean | 5 | 8236.13 | 0.41966 | 19.9947 | 0.0253164 | 411.92 | | NTA_V1 | 5 | 5888.36 | 0.387781 | 16.7624 | 0.0253107 | 412.792 | | NTA_V2 | 5 | 6422.11 | 0.394004 | 17.7947 | 0.0237186 | 416.328 | | Alpha_Suffix | 0.2 | 5877.92 | 0.384664 | 16.6868 | 0.030484 | 408.711 | | Alpha_Suffix | 0.4 | 5814.86 | 0.378298 | 16.6081 | 0.0290987 | 409.952 |

    Although we haven't tuned the hyperparameters, we've already obtained smaller variance and better RMSE.

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by xcgoner 20
  • Predictions are way too high when modeling an intermittent count series with DeepAR and NegBin distribution?

    Predictions are way too high when modeling an intermittent count series with DeepAR and NegBin distribution?

    I'm trying to model a simulated series of weekly seasonal intermittent sales, with values between 0 and 4. I generated 5 years of simulated data:

    Screen Shot 2020-02-13 at 1 30 59 PM

    I trained a DeepAR model with the output distribution set to Negative Binomial (all other settings were the default settings), on 3 years, and generated predictions for the next two. I got the following results (plotting the [70.0, 80.0, 95.0] predictions intervals):

    Screen Shot 2020-02-13 at 1 31 50 PM

    Increasing number of training epochs doesn't change anything, the loss falls to its lowest value around the 8th to 10th epoch and hovers more or less around there, whether I train for 10 or 100 epochs. I thought training on 3 years and testing on 2 might be too ambitious, so I tried 4y/1y split instead, and the results got much worse - and downright strange - this time with values climbing into the 100s, even though the largest historical value the series ever reaches is 4 (I'm using the same input series, but is seems flat now because the scale is completely skewed by how large the predictions are):

    Screen Shot 2020-02-13 at 3 55 13 PM

    I'm wondering if I am doing anything wrong? Are there any special settings for DeepAR when applied to intermittent series?

    For comparison, the DeepAREstimator worked pretty well out of the box for more traditional series (using Student's distribution), for example:

    Screen Shot 2020-02-12 at 4 49 47 PM

    Details:

    Train data: [{'start': Timestamp('2014-01-05 00:00:00', freq='W-SUN'), 'target': array([1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 1., 1., 1., 1., 1., 2., 0., 0., 1., 2., 2., 1., 4., 1., 2., 1., 0., 0., 2., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0., 0., 1., 2., 1., 2., 0., 1., 1., 2., 3., 2., 2., 1., 1., 3., 4., 1., 1., 0., 0., 3., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 0., 0., 0., 1., 1., 0., 2., 1., 1., 0., 1., 0., 1., 2., 2., 1., 2., 3., 3., 1., 2., 2., 0., 0., 2., 0., 3., 0., 1., 2., 0., 1., 1.], dtype=float32), 'source': SourceContext(source='list_data', row=1)}]

    Test data: {'start': Timestamp('2017-01-08 00:00:00', freq='W-SUN'), 'target': array([2., 1., 2., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 1., 2., 3., 1., 0., 3., 2., 1., 0., 0., 2., 2., 2., 1., 0., 2., 0., 2., 2., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 2., 0., 0., 4., 1., 2., 2., 1., 3., 1., 2., 1., 2., 1., 2., 3., 3., 1., 2., 0.], dtype=float32), 'source': SourceContext(source='list_data', row=1)}

    Estimator used:

    estimator = DeepAREstimator(freq="W", prediction_length=105, trainer=Trainer(epochs=10),distr_output=NegativeBinomialOutput()) predictor = estimator.train(training_data=training_data)

    question 
    opened by SkanderHn 20
  • PyTorch implementation of DeepAR

    PyTorch implementation of DeepAR

    Work in progress, open for comments.

    This ports the PyTorch implementation of DeepAR from PyTorchTS (cc @kashif), with some changes:

    • The estimator class was slightly refactored, and in particular the way data loaders are set up is more in line with other estimators (but I want to try out a few things here, this is giving me some thoughts)
    • No specific "trainer" class was implemented, and instead the estimator relies on PyTorch Lightning for this.
    • The network is now down to a single class implementing both loss computation and sample paths prediction, following torch's .training convention
    • A thin extension to the network provides the interface used by Lightning

    A few surrounding, related changes are also included.

    Some open questions:

    1. should the dtype and device be specified at constructor time for the estimator? Or is it something we want to pass to the train method?
    2. the base estimator class is really PyTorch Lightning oriented: should it be called PyTorchLightningEstimator?
    3. we would now have gluonts.model containing existing models (mxnet based) and gluonts.torch.model containing this one; should the mxnet ones moved to gluonts.mx.model for the sake of symmetry?

    TODOs (partial list, probably):

    • [x] cover also validation data in tests
    • [x] remove the input_size parameter from the estimator (this should probably be inferred from the other ones)
    • [x] re-include the option to pseudo-shuffle batches at training time
    • [x] improve tests (also serde and so on)
    • [x] open issue on the left-over features of the model, and make it release-blocking
    • [x] run experiments to check the model accuracy

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

    new feature 
    opened by lostella 19
  • potential bottleneck in training

    potential bottleneck in training

    Description

    I profiled my training which was taking too long and here is what I believe the part that is taking the longest:

    Profile stats for: run_training_epoch
             309984 function calls (302234 primitive calls) in 82.921 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.000    0.000   82.921   82.921 base.py:168(run)
        520/2    0.000    0.000   82.921   41.460 {built-in method builtins.next}
            2    0.000    0.000   82.792   41.396 fetching.py:271(_fetch_next_batch)
            4    0.000    0.000   82.792   20.698 apply_func.py:73(apply_to_collection)
            2    0.000    0.000   82.792   41.396 supporters.py:547(__next__)
            2    0.000    0.000   82.792   41.396 supporters.py:555(request_next_batch)
            2    0.000    0.000   82.792   41.396 itertools.py:174(__iter__)
            2    0.000    0.000   82.792   41.396 dataloader.py:639(__next__)
            2    0.001    0.000   82.792   41.396 dataloader.py:680(_next_data)
            2    0.001    0.001   82.791   41.395 fetch.py:24(fetch)
          512    0.000    0.000   82.777    0.162 util.py:140(__iter__)
      992/512    0.001    0.000   82.777    0.162 _base.py:102(__iter__)
     5312/512    0.018    0.000   82.776    0.162 _base.py:121(__call__)
          512    0.005    0.000   82.773    0.162 _base.py:174(__call__)
          479    0.001    0.000   81.289    0.170 itertools.py:68(__iter__)
          479    0.013    0.000   78.146    0.163 feature.py:354(map_transform)
          479    0.014    0.000   76.209    0.159 feature.py:367(<listcomp>)
         2395    0.026    0.000   73.799    0.031 extension.py:67(fget)
        14851    0.015    0.000   73.575    0.005 {built-in method builtins.getattr}
         2395    0.020    0.000   73.559    0.031 period.py:97(f)
         2395   73.505    0.031   73.505    0.031 {pandas._libs.tslibs.period.get_period_field_arr}
            1    0.000    0.000   42.221   42.221 training_epoch_loop.py:157(advance)
    ...
    

    To reproduce kindly train the pytorch DeepAR estimator with:

    ...
    trainer_kwargs=dict(..., profiler="advanced"),
    ...
    

    and train with num_workers=0

    bug 
    opened by kashif 18
  • Add `TimeLimitCallback` to `mx/trainer` callbacks.

    Add `TimeLimitCallback` to `mx/trainer` callbacks.

    Issue #, if available:

    Description of changes: Add TimelimitCallback so that user can set a time limit to the training process.

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

    BREAKING new feature 
    opened by yx1215 18
  • Predict for future date without target value

    Predict for future date without target value

    I need to predict for future dates with some dates missing in between the training date and the date I wan to predict. So I wont be having any target values. When I use Nan for target series, My forecast is mostly on 0.

    question 
    opened by ManikandanThangavelu 18
  • Use `pd.Period` instead of `pd.Timestamp`.

    Use `pd.Period` instead of `pd.Timestamp`.

    Issue #, if available:

    Description of changes:

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

    enhancement BREAKING other change 
    opened by jaheba 17
  • Multiprocessing data loader.

    Multiprocessing data loader.

    Issue, if available: With a multiprocessing data loader we should overcome data loading bottlenecks. Will fix issue https://github.com/awslabs/gluon-ts/issues/682.

    Description of changes:

    • Datasets use the class attributes of MPWorkerInfo to get information about their multiprocessing environment.
    • Datasets are replicated among workers (only Object reference though, not the physical dataset), this happens exactly one in the beginning of the training
    • Datasets are not cached by default (caching not implemented so far)
    • Data loading can now be done in a multiprocessing fashion by specifying the number of workers, this works for training set and validation set (for inference there is some bug for now, but that has the least impact on performance of all datasets)
    • Parallelisation for datasets: modulo based, i.e. every num_workerth ts will be assigned to the corresponding worker. // however, this does not guarantee that the batches will always be sampled from equidistant locations for training, since some workers could potentially be slower or faster
    • The data loaders return batches of transformed samples of batch_size in the requested context. The transformation is done according to the provided transformation.
    • There is no threading support (wouldn't make sense since we are also doing computation heavy transformations), there is no memory pinning support (not necessary since we load the batches into the right context right away)
    • Which exact batches and samples one gets is nondeterministic if num_workers > 0

    Future extensibility:

    • the main functions to modify will be the batching function and the stacking function, the transformation can already be replaced to any that produces a list of samples if applied to a dataset
    • any dataset that makes use of the MPWorkerInfo class can be effectively parallelelized

    Missing functionality:

    • Shuffling (beyond a single batch)
    • Dataset caching
    • Correct documentation

    Current bugs:

    • No mp support for windows due to pickling error.
    • No mp support for InferenceDataLoader due to pickling error.

    Possible improvements

    • Create named tuple for all the different data the worker processes use
    • Only pass subset of dataset to worker
    • Switch away from Pool to something that allows for more fine-grained control, like manually creating Processes as seen in Pytorch's data-loader or make use of libraries like Ray using Actors

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by AaronSpieler 17
  • Multivariate time series forecasting question

    Multivariate time series forecasting question

    My apologies for the ignorant questions in advance, while I’m not necessarily new to deep learning, I’m a new fairly new to time series forecasting, especially when using deep learning techniques for it.

    Due to the fact gluon-ts is making use of DL based approaches, dealing with non-stationarity in training datasets is not necessary, unlike when using AR/MA and VAR based models, correct? This appears to be outlined here.

    Also, I am working with a multivariate time series dataset in which the target/dependent variable is related and/or dependent on other features/independent variables. So, while I’m only trying to predict one target variable, the relationship between this target variable and the other features is important; consequently, this leads to two questions.

    First, since the relationship between the target variable and other features is important, are the most applicable models deepvar and gpvar or will other models in gluon-ts work and I’m just thinking too much in terms of classical time series forecasting?

    Second, if I’m using deepvar or gpvar, I’m assuming that when making the dataset, the target should be a vector of vectors which include my target variable and the other features, right? However, if I’m thinking too much in terms of classical time series forecasting, target should be a vector of the target variable and I should store the other features as vectors of vectors in either dynamic_feat or cat, right?

    Again, I’m sorry for my ignorance. Thanks in advance for any assistance you provide.

    question 
    opened by CMobley7 17
  • TemporalFusionTransformer implementation in PyTorch

    TemporalFusionTransformer implementation in PyTorch

    Description of changes:

    Unlike the MXNet implementation of TFT, the PyTorch versions of model/estimator are compatible with the dataset schema used by DeepAR. For example, if we construct the estimator with as follows

    estimator = TemporalFusionTransformerEstimator(
       ...,
       dynamic_dims=[2, 5, 7],
       dynamic_cardinalities=[6, 5, 5, 4, 2],
       past_dynamic_dims=[3, 3],
       past_dynamic_cardinalities=[5, 2, 4],
    )
    

    it will expect to receive a dataset, where each time series has keys

    • "feat_dynamic_real": shape [..., sum(dynamic_dims)] (inside the network this will feature get partitioned into 3 chunks of dim 2, 5, 7)
    • "feat_dynamic_cat": shape [..., len(dynamic_cardinalities)]
    • "past_feat_dynamic_real": shape [..., sum(past_dynamic_dims)]
    • "past_feat_dynamic_cat": shape [..., len(past_dynamic_cardinalities)]

    To do:

    • [ ] Move QuantileOutput and TFTInstanceSplitter to a another folder?
    • [ ] Benchmarking
    • [ ] Add tests

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

    opened by shchur 2
  • Wrong splitting when generating rolling datasets on multivariate DataFrame

    Wrong splitting when generating rolling datasets on multivariate DataFrame

    Description

    When calling generate_rolling_dataset on a multivariate dataset (dataset = PandasDataset(dict(df_wide))), it will generates some time series with different lengths in rolled (rolled = generate_rolling_dataset(dataset=dataset, strategy = StepStrategy(prediction_length=1). Let the lengths be [5, 6, 7, 8, 9] and the target dim (i.e., the number of variates) be 3 for simplicity.

    If I use MultivariateGrouper on the rolled, it will split the rolled into several data entries (in multivariate_grouper). We hope that series in each data entry share the same length, where series in different data entries have different lengths. Which means, hopefully, we will have five data entries (with length of 5, 6, 7, 8, 9) and each entry contains three (corresponding to the variate number) time series. We denote this as: (5,5,5), (6,6,6),..., (9,9,9)

    However, the series in each data entry have different lengths, like (5,6,7),(5,7,8),(5,7,9)...., meaning that these series are not assigned as we hope.

    To Reproduce

    df_wide = $\quad\quad \quad$ || series 1 || series2 2000-01-01 || 0 || 1 2000-01-02 || 0 || 1 2000-01-03 || 0 || 1 2000-01-05 || 1 || 0 ... 2000-01-09 || 1 || 0

    dataset = PandasDataset(dict(df_wide))
    rolled = generate_rolling_dataset(dataset=dataset, strategy = StepStrategy(prediction_length=2),start_time="2000-01-04")
    
    test_grouper = MultivariateGrouper(num_test_dates=5, max_target_dim=2)
    dataset_test = test_grouper(rolled)
    

    Error message or code output

    (Paste the complete error message, including stack trace, or the undesired output that the above snippet produces.)

    /home/cheryl/anaconda3/envs/diff/lib/python3.7/site-packages/gluonts/dataset/multivariate_grouper.py:197: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
      return {FieldName.TARGET: np.array([funcs(data) for data in dataset])}
    Traceback (most recent call last):
    ......
      File "/home/cheryl/workspace/diff/Ongoing/dataload.py", line 80, in for_pred
        dataset_test = test_grouper(rolled)
      File "/home/cheryl/anaconda3/envs/diff/lib/python3.7/site-packages/gluonts/dataset/multivariate_grouper.py", line 87, in __call__
        return self._group_all(dataset)
      File "/home/cheryl/anaconda3/envs/diff/lib/python3.7/site-packages/gluonts/dataset/multivariate_grouper.py", line 125, in _group_all
        grouped_dataset = self._prepare_test_data(dataset)
      File "/home/cheryl/anaconda3/envs/diff/lib/python3.7/site-packages/gluonts/dataset/multivariate_grouper.py", line 162, in _prepare_test_data
        list(dataset_at_test_date), dtype=np.float32
    ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (21,) + inhomogeneous part.
    

    Environment

    • Operating system: Linux Ubuntu
    • Python version: 3.7
    • GluonTS version: 0.10.0
    • MXNet version:

    (Add as much information about your environment as possible, e.g. dependencies versions.)

    bug 
    opened by cherylLbt 2
  • Integrating N-Linear and D-Linear

    Integrating N-Linear and D-Linear

    Are there any plans to integrate those models as proposed in

    Zeng, A., Chen, M., Zhang, L., & Xu, Q. (2022). Are Transformers Effective for Time Series Forecasting?. arXiv preprint arXiv:2205.13504.

    Darts already implemented them. These models are also probabilistic.

    https://unit8co.github.io/darts/generated_api/darts.models.forecasting.dlinear.html#r844e17822ca3-1

    enhancement 
    opened by baniasbaabe 0
  • add TDformer

    add TDformer

    This is the official implementation of the paper "First De-trend then Attend: Rethinking Attention for Time-Series Forecasting" (https://openreview.net/pdf?id=GLc8Rhney0e), which is an internship project at AWS and has been accepted by NeurIPS 2022 All Things Attention workshop.

    new feature 
    opened by xiyuanzh 0
  • Consolidate `DeepNPTSEstimator`

    Consolidate `DeepNPTSEstimator`

    Description of changes:

    • Align DeepNPTSEstimator.train arguments to those of Estimator and PyTorchLightningEstimator in particular (move many of its arguments to the constructor)
    • Make dropout_rate not Optional: it's a float that can be zero
    • Consolidate tests for DeepNPTSEstimator with similar ones for other estimator classes
    • Rename DeepNPTSMultiStepPredictor -> DeepNPTSMultiStepNetwork to avoid confusion (since it's not a Predictor subclass)
    • Fix docstrings that ended up formatted wrong for some reason

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

    enhancement BREAKING 
    opened by lostella 1
  • calculate_seasonal_error bug for multivariate data

    calculate_seasonal_error bug for multivariate data

    Description

    calculate_seasonal_error is buggy when the inputs are multivariate data. past_data = extract_past_data(time_series, forecast) results in (dim, time) shaped array rather than a (time,) shaped array, which is then passed into calculate_seasonal_error, which calls the follow code snippet. https://github.com/awslabs/gluonts/blob/cac5e6b91cbd545c0811b20849290e9215c1b822/src/gluonts/evaluation/_base.py#L323-L326 Rather than calculating the seasonal error, it is differencing between multivariate dimensions.

    https://github.com/awslabs/gluonts/blob/4fef7e26470d15096b11b005be846dedf87fb736/src/gluonts/evaluation/metrics.py#L49-L50

    To Reproduce

    (Please provide minimal example of code snippet that reproduces the error. For existing examples, please provide link.)

    from datetime import timedelta
    import pandas as pd
    import numpy as np
    from gluonts.evaluation import MultivariateEvaluator
    from gluonts.model.forecast import QuantileForecast
    
    dim = 5
    time = 20
    past = 40
    
    forecast = QuantileForecast(
        forecast_arrays=np.random.randn(3, dim, time),
        forecast_keys=['mean', '0.95', '0.05'],
        start_date=pd.to_datetime('today').to_period('H')
    )
    
    time_series = pd.DataFrame(zip(*[np.arange(past + time) for _ in range(dim)]), 
                                pd.period_range(start=pd.to_datetime('today') - timedelta(hours=past), periods=past+time, freq='H'))
    evaluator = MultivariateEvaluator(seasonality=1)
    evaluator.get_metrics_per_ts(time_series, forecast)
    

    Error message or code output

    'seasonal_error': 0.0

    Desired message or code output

    'seasonal_error': 1.0

    Environment

    • GluonTS version: '0.11.4'
    bug 
    opened by gorold 5
Releases(v0.11.7)
  • v0.11.7(Jan 3, 2023)

    Backporting fixes:

    • Make serde.dataclass always kw-only. (#2428 by @jaheba)
    • Fix serde.dataclass inheritance handling. (#2512 by @jaheba)
    • Fix QuantileForecast.quantile in case only mean is stored (#2513 by @lostella)
    • Remove mypy plugin for dataclass. (#2514 by @jaheba)
    • GH Actions: Use authenticated requests for just. (#2522 by @jaheba)
    • Fix aggregate_valid for non-numerical columns (#2526 by @lostella)
    Source code(tar.gz)
    Source code(zip)
  • v0.11.6(Dec 20, 2022)

    Backporting fixes:

    • itertools.select. #2426 by @jaheba
    • Fix dataclass handling of member inheritance. #2492 by @jaheba
    • Fix: sort dataset keys in error message when importing non-existing dataset #2497 by @lostella
    • Fix DateSplitter for multiples of base frequencies #2500 by @lostella
    • Fix docstrings according to docformatter #2501 by @lostella
    • Add examples to docstring for periods_between #2504 by @lostella
    • Cap numpy compatibility in mxnet extra requirements #2506 by @lostella
    • Pin docformatter version. #2507 by @jaheba
    • Docs: Fix install instructions. #2508 by @jaheba
    Source code(tar.gz)
    Source code(zip)
  • v0.11.5(Dec 13, 2022)

    What's Changed

    • Backports for v0.11.5. by @jaheba in https://github.com/awslabs/gluonts/pull/2491

    Full Changelog: https://github.com/awslabs/gluonts/compare/v0.11.4...v0.11.5

    Source code(tar.gz)
    Source code(zip)
  • v0.11.4(Dec 5, 2022)

  • v0.11.3(Nov 24, 2022)

    Backporting fixes:

    • Add test cases for PandasDataset, fix missing assertion (#2453 by @lostella)
    • Speed up PandasDataset further (#2441 by @lostella)
    • Fix MANIFEST.in (#2456 by @lostella)
    Source code(tar.gz)
    Source code(zip)
  • v0.11.3rc1(Nov 24, 2022)

    Backporting fixes:

    • Add test cases for PandasDataset, fix missing assertion (#2453 by @lostella)
    • Speed up PandasDataset further (#2441 by @lostella)
    • Fix MANIFEST.in (#2456 by @lostella)
    Source code(tar.gz)
    Source code(zip)
  • v0.11.2(Nov 21, 2022)

    Backporting fixes:

    • Fix rotbaum random seed and num_samples argument. (#2408 by @sighellan)
    • Hierarchical: Make sure the input S matrix is of right dtype (#2415 by @rshyamsundar)
    • Mypy fixes (#2427 by @jaheba)
    • Speed up PandasDataset for long dataframes (#2435 by @lostella)
    • Fix frequency inference in PandasDataset (#2442 by @lostella)
    • Tests: Change Python versions. (#2448 by @jaheba)
    Source code(tar.gz)
    Source code(zip)
  • v0.11.1(Oct 28, 2022)

    Backporting fixes:

    • Fix dominick dataset bug. (#2364 by @haskarb)
    • Remove strange quoting marks from docstrings (#2368 by @lostella)
    • Consistent use of term "prediction interval" (#2373 by @codingWhale13)
    • Fix MQCMM ignoring zero-seed. (#2379 by @sighellan)
    Source code(tar.gz)
    Source code(zip)
  • v0.10.8(Oct 28, 2022)

    Backporting fixes:

    • Fix numerical bug in BinnedUniforms (#2344 by @moudheus)
    • Fix dominick dataset bug. (#2364 by @haskarb)
    • Fix MQCMM ignoring zero-seed. (#2379 by @sighellan)
    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Oct 10, 2022)

    Overview

    Incremental training

    Estimators are now re-trainable on new data, using the train_from method. This accepts a previously trained model (predictor), and new data to train on, and can greatly reduce training time if combined with early stopping. The feature is integrated with gluonts.shell-based SageMaker containers, and can be used by specifying the additional model channel to point to the output of a previous training job. More info in #2249.

    New models

    Two models are added in this release:

    • DeepVARHierarchicalEstimator, a hierarchical extension to DeepVAREstimator; learn more about how to use this in this tutorial.
    • DeepNPTSEstimator, a global extension to NPTS, where sampling probabilities are learned from data; learn more on how to use this estimator here.

    Deprecated import paths and options

    This release moves MXNet-based models from gluonts.model to gluonts.mx.model; the old import paths continue working in this release, but are deprecated and will be removed in the next release. For example, now the MXNet-based DeepAREstimator should be imported from gluonts.mx (or gluonts.mx.model.deepar).

    We also removed deprecated options for learning rate reduction in the gluonts.mx.Trainer class: these can now be controlled via the LearningRateReduction callback.

    Dataset splitting functionality (experimental)

    We updated the functionality to split time series datasets (along the time axis) for training/validation/test purposes. Now this functionality can be easily accessed via the split function (from gluonts.dataset.split import split); learn more about this here.

    This feature is experimental and subject to future changes.

    Changelog

    Breaking changes

    • Breaking: Update data splitters to return (input, output) pairs in the test split (#2031 by @npnv)
    • Breaking: Move MXNet-based models to mx.model. (#2126 by @Hongqing-work)
    • Convert time-features into functions. (#2149 by @jaheba)
    • Remove deprecated args from mx.Trainer. (#2153 by @jaheba)
    • Reduce sdist size. (#2199 by @jaheba)
    • Remove core.exception module. (#2202 by @jaheba)
    • Remove core.ty. (#2203 by @jaheba)
    • Update gluonts.dataset.split code, test, docs (#2223 by @lostella)
    • Remove gluonts_forecasters entrypoint mechanic. (#2278 by @jaheba)
    • Enable 'python -m gluonts'. (#2292 by @jaheba)

    New features / major improvements

    • Interrupting mx.Trainer stops training. (#2131 by @Hongqing-work)
    • Expose evaluator aggregation_strategy functions (#2198 by @kashif)
    • Add data preparation utility for hierarchical time series and a tutorial notebook (#2206 by @rshyamsundar)
    • Add Deep NPTS model (#1835 by @rshyamsundar)
    • Improve arrow reading performance. (#2217 by @mr-1993)
    • Allow DeepVAR model to use (global) dynamic features (#2226 by @rshyamsundar)
    • Hierarchical: Allow use of external dynamic features and add a section in the tutorial (#2253 by @rshyamsundar)
    • Add serde.dataclass. (#2166 by @jaheba)
    • R: Add Python wrapper for calling R's hierarchical methods (#1685 by @rshyamsundar)
    • Add learning rate and weight decay arguments to PyTorch estimators (#2289)
    • Added LR scheduler to DeepAR Pytorch (#2287 by @shubhamkapoor)
    • Add LR scheduling patience option to MQF2 (#2291 by @lostella)
    • Add incremental training (#2249 by @lostella)
    • Add input size and type information to DeepARModel, and example_input_array to DeepARLightningModule. (#2307 by @jgasthaus)
    • Add dataset.schema.translate. (#2304 by @jaheba)
    • Add forecast_start to entry-wise metrics in evaluator (#2312 by @lostella)

    Bug fixes / minor improvements

    • Fix DatasetCollection (#2135 by @rsnirwan)
    • Fix PandasDataset for Python 3.9 (#2141 by @lostella)
    • Make PandasDataset faster (#2148 by @lostella)
    • Ignore divide warnings in evaluation. (#2159 by @jaheba)
    • Fix Prophet wrapper to work with Timestamp instead of Period (#2182 by @lostella)
    • Fix dtype for "item_id" column in metrics dataframe (#2183 by @lostella)
    • Fix recursive case for gluonts.mx.batchify.stack (#2184 by @lostella)
    • Fix item_id values in ConstantValuePredictor (#2192 by @codingWhale13)
    • Fixup Patience class. (#2197 by @jaheba)
    • Fix dataset arrow writer tool. (#2196 by @jaheba)
    • Fix SymbolBlock serde issue (#2187 by @lostella)
    • Add item id to Uber TLC dataset (#2214 by @mvanness354)
    • Fix r_forecast wrapper to shift start date when truncating time series (#2216 by @abdulfatir)
    • Fix dtype bug in piecewise_linear and add a test (#2224 by @rshyamsundar)
    • Fix bug in to_quantile_forecast (#2225 by @eugeneteoh)
    • Fix gluonts.mx.trainer.Trainer in case of empty data loader (#2228 by @lostella)
    • Fix feed-forward models when features are provided (#2238 by @lostella)
    • update SplicedBinnedPareto demos from nursery version to gluonts version (#2250 by @elenaehrlich)
    • Improve len() for ParquetFile. (#2261 by @jaheba)
    • Move max_idle_transform usage to GluonEstimator. (#2262 by @jaheba)
    • Optimize TimeSeriesSlice performance (#2259 by @lostella)
    • Fix ignore hidden files when generating datasets (#2263 by @kashif)
    • Fix: set max idle transforms in PyTorch estimators (#2266 by @lostella)
    • Fix QuantileForecast.plot() to use DateTimeIndex (#2269 by @abdulfatir)
    • Fix serde dataclass eventual. (#2277 by @jaheba)
    • Fix gluonts.dataset.split for multivariate case (#2314 by @lostella)
    • Improve TestData class in gluonts.dataset.split (#2315 by @lostella)
    • Simplify make_evaluation_predictions (#2309 by @lostella)
    • Fix MQCNN for kernel_size=1 (#2321 by @lostella)
    • Simplify unbatching in forecast-generator. (#2334 by @jaheba)
    • Fix numerical bug in BinnedUniforms (#2344 by @moudheus)

    Documentation

    • Docs: Make notebook templates. (#2122 by @jaheba)
    • Docs: Rework installation section. (#2130 by @jaheba)
    • Docs: Fix running tutorials for publishing docs. (#2138 by @jaheba)
    • Docs: Update hyperparameter tuning with optuna notebook. (#2137 by @npnv)
    • Fix issues with hyperparameter tuning tutorial (#2143 by @lostella)
    • Apply black to notebooks. (#2144 by @jaheba)
    • Docs: Simplify wide DataFrame example (#2150 by @lostella)
    • Docs: fix links in models table (#2156 by @lostella)
    • Add 'Background' section to docs. (#2129 by @jaheba)
    • Docs: Add info about version guarantees. (#2161 by @jaheba)
    • Docs: fix tutorial after breaking changes in trainer class (#2179 by @lostella)
    • Add tutorial with data splitting examples (#2157 by @npnv)
    • Fix: add missing link to splitting tutorial (#2185 by @lostella)
    • Fix: ensure last cell of tutorials runs (#2186 by @lostella)
    • Fixes to the dataset splitting tutorial (#2189 by @npnv)
    • Update TSBench readme with paper reference (#2191 by @geoalgo)
    • Update Available models table with the hierarchical model (#2209 by @rshyamsundar)
    • Fix broken links in Available-models table (#2211 by @rshyamsundar)
    • Add logo to README. (#2248 by @jaheba)
    • New logo. (#2243 by @jaheba)
    • Use brand colors in docs. (#2257 by @jaheba)
    • Docs: Reformatting table, badge colors. (#2258 by @jaheba)
    • Docs: update contribution guidelines and dev setup (#2270 by @lostella)
    • Add Github footer icon to docs. (#2285 by @jaheba)
    • Docs: Custom Pygments style for dark theme. (#2290 by @jaheba)
    • Fix README quick examples (#2297 by @lostella)
    • Fix text in Quick Start Tutorial (#2300 by @sighellan)
    • Update README and tutorial (#2311 by @lostella)
    • Turn on apidoc generation (#2332 by @jaheba)
    • Add info on how to use 'just' (#2339 by @codingWhale13)
    • Small documentation improvements (#2343 by @codingWhale13)

    Test / setup changes

    • add python 3.9 to test workflows (#2136)
    • Tests: Move mx model test. (#2158 by @jaheba)
    • Test: Use spawn method for shell server tests. (#2177 by @jaheba)
    • Remove holidays and matplotlib from core dependencies. (#2055 by @jaheba)
    • Update minimal version for nbconvert. (#2233 by @jaheba)
    • Hierarchical: Add a test for to_dataset method (#2265 by @rshyamsundar)
    • Fix mypy and black commands in pre-commit githook (#2271 by @abdulfatir)
    • Update project_urls. (#2274 by @jaheba)
    • Move _version to meta. (#2293 by @jaheba)
    • Remove setup-requires. (#2295 by @jaheba)
    • Remove pytest.ini. (#2298 by @jaheba)
    • Speed up smoke tests (#2341 by @lostella)
    Source code(tar.gz)
    Source code(zip)
  • v0.10.7(Sep 27, 2022)

    Backporting fixes:

    • Add Github footer icon to docs. (#2285 by @jaheba)
    • Docs: Custom Pygments style for dark theme. (#2290 by @jaheba)
    • Fix README quick examples (#2297 by @lostella)
    • Fix text in Quick Start Tutorial (#2300 by @sighellan)
    • Update README and tutorial (#2311 by @lostella)
    • Fix MQCNN for kernel_size=1 (#2321 by @lostella)
    Source code(tar.gz)
    Source code(zip)
  • v0.10.6(Sep 6, 2022)

    Backporting fixes:

    • Improve len() for ParquetFile. (#2261 by @jaheba)
    • Max idle transform fix (#2262 by @jaheba)
    • Fix ignore hidden files when generating datasets (#2263 by @kashif)
    • Fix: set max idle transforms in PyTorch estimators (#2266 by @lostella)
    • Fix QuantileForecast.plot() to use DateTimeIndex (#2269 by @abdulfatir)
    Source code(tar.gz)
    Source code(zip)
  • v0.10.5(Aug 26, 2022)

    Backporting fixes:

    • Fix broken links in Available-models table (#2211 by @rshyamsundar)
    • Fix r_forecast wrapper to shift start date when truncating time series (#2216 by @abdulfatir)
    • Improve arrow reading performance (#2217 by @mr-1993)
    • Fix dtype bug in piecewise_linear and add a test (#2224 by @rshyamsundar)
    • Fix bug in to_quantile_forecast (#2225 by @eugeneteoh)
    • Fix gluonts.mx.trainer.Trainer in case of empty data loader (#2228 by @lostella)
    • Fix feed-forward models when features are provided (#2238 by @lostella)

    Full changelog: https://github.com/awslabs/gluon-ts/compare/v0.10.4...v0.10.5

    Source code(tar.gz)
    Source code(zip)
  • v0.9.9(Aug 26, 2022)

    Backporting fixes:

    • Fix r_forecast wrapper to shift start date when truncating time series (#2216 by @abdulfatir)
    • Fix dtype bug in piecewise_linear and add a test (#2224 by @rshyamsundar)
    • Fix bug in to_quantile_forecast (#2225 by @eugeneteoh)
    • Fix gluonts.mx.trainer.Trainer in case of empty data loader (#2228 by @lostella)
    • Fix feed-forward models when features are provided (#2238 by @lostella)

    Full Changelog: https://github.com/awslabs/gluon-ts/compare/v0.9.8...v0.9.9

    Source code(tar.gz)
    Source code(zip)
  • v0.10.4(Aug 14, 2022)

    Backporting fixes:

    • Fix SymbolBlock serde issue (#2187 by @lostella)
    • Fix dataset arrow writer tool. (#2196 by @jaheba)
    • Expose evaluator aggregation_strategy functions (#2198 by @kashif)
    • Update Available models table with the hierarchical model (#2209 by @rshyamsundar)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.8(Aug 14, 2022)

  • v0.10.3(Aug 8, 2022)

    Backporting fixes:

    • Fix Prophet wrapper to work with Timestamp instead of Period (#2182 by @lostella)
    • Fix dtype for "item_id" column in metrics dataframe (#2183 by @lostella)
    • Fix recursive case for gluonts.mx.batchify.stack (#2184 by @lostella)
    • Fix: ensure last cell of tutorials runs (#2186 by @lostella)
    • Fix item_id values in ConstantValuePredictor (#2192 by @codingWhale13)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.7(Aug 8, 2022)

    Backporting fixes:

    • Fix dtype for "item_id" column in metrics dataframe (https://github.com/awslabs/gluon-ts/pull/2183 by @lostella)
    • Fix recursive case for gluonts.mx.batchify.stack (https://github.com/awslabs/gluon-ts/pull/2184 by @lostella)
    • Fix item_id values in ConstantValuePredictor (https://github.com/awslabs/gluon-ts/pull/2192 by @codingWhale13)
    Source code(tar.gz)
    Source code(zip)
  • v0.10.2(Jul 14, 2022)

    Backport fixes:

    • Make PandasDataset faster (#2148 by @lostella)
    • Interrupting mx.Trainer stops training. (#2131 by @Hongqing-work)
    • Ignore divide warnings in evaluation. (#2159 by @jaheba)
    Source code(tar.gz)
    Source code(zip)
  • v0.10.1(Jul 8, 2022)

    Backporting fixes:

    • Docs: Make notebook templates. (#2122 by @jaheba)
    • Docs: Rework installation section. (#2130 by @jaheba)
    • Fix DatasetCollection for Python 3.9. (#2135 by @rsnirwan)
    • Docs: Fix running tutorials for publishing docs. (#2138 by @jaheba)
    • Fix PandasDataset for Python 3.9 (#2141 by @lostella)
    • Fix issues with hyperparameter tuning tutorial (#2143 by @lostella)
    • Docs: Apply black to notebooks. (#2144 by @jaheba)
    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Jun 30, 2022)

    Overview

    Arrow based datasets

    We have added support for Parquet-files, as well as Arrow's binary format. This is an opt-in feature, requiring pyarrow to be installed. Use pip install 'gluonts[pro]' or pip install 'gluonts[arrow]' to ensure the correct version is installed.

    FileDataset has been reworked to support .parquet and .arrow files. Previously, it had assumed all files to use jsonlines. To continue using jsonlines ensure that the the files use one of the .json, .jsonl, .json.gz, jsonl.gz suffixes.

    Depending on the dataset size and shape, Arrow can be much faster than the json variant. In more extreme cases we saw speedups of more than 100x when using arrow vs jsonlines (see #2003 for some examples).

    To convert a given dataset into arrow, you can use the gluonts.dataset.arrow utility:

    python -m gluonts.dataset.arrow write </path/to/dataset> my-dataset.arrow
    

    PandasDataset

    We have added support for pandas.DataFrame and pandas.Series as well. You can now directly model data given in a DataFrame using gluonts.dataset.pandas.PandasDataset. In this tutorial we describe in depth how you can use PandasDataset to speed up modelling using GluonTS.

    Changelog

    New Features

    • #1631 - Add TimeLimitCallback to mx/trainer callbacks. (by @yx1215)
    • #1780 - adding MQF2 (Multi-horizon) (by @KelvinKan)
    • #1903 - Added QuarterlyBegin time feature (by @kashif)
    • #1924 - Porting SimpleFeedForwardEstimator to PyTorch (by @lostella)
    • #1925 - DeepAR PyTorch: make samplers configurable (by @lostella)
    • #1935 - added support for pandas dataframes (by @rsnirwan)
    • #1962 - Add support for beta-NLL loss (by @kashif)
    • #1982 - Add Uber-TLC dataset to dataset repository. (by @Hongqing-work)
    • #1990 - Add info cli. (by @jaheba)
    • #1987 - Add HP tuning example with Optuna (by @npnv)
    • #2000 - Add arrow-based dataset. (by @vafl, @lostella, @jaheba)
    • #2002 - add ND for item_metrics (by @melopeo)
    • #2006 - Added support of "long" RTS, making short RTS be "past_feat_dynamic_real" (by @zoolhasson)
    • #2061 - Add DatasetWriter. (by @jaheba)
    • #2074 - Add support for second frequency. (by @kashif)

    Breaking Changes

    • #1917 - Breaking: Fix return types of features (by @lostella)
    • #1941 - Breaking: Update dependency fbprophet -> prophet (by @lostella)
    • #1946 - Breaking: Split incremental quantile output into separate class (by @lostella)
    • #1965 - Breaking: reorg torch package, shorten import paths (by @lostella)
    • #1980 - Use pd.Period instead of pd.Timestamp. (by @jaheba)
    • #1997 - Remove freq argument from Forecast. (by @kashif)
    • #2011 - Remove dct_reduce. (by @jaheba)
    • #2017 - Remove mandatory freq attribute of Predictor. (by @kashif)
    • #2018 - Remove multiprocessing dataloader. (by @jaheba)
    • #2019 - Rework FileDataset. (by @jaheba)
    • #2053 - Add dataset_writer to get_dataset. (by @Hongqing-work)
    • #2070 - Add jsonl.encode_json, remove serialize_data_entry. (by @jaheba)

    Bug Fixes / Minor Improvements

    • #1704 - Settings._let will pop element it added instead of just the last one. (by @jaheba)
    • #1905 - Fix typing issues in torch estimators, update base estimators docstrings (by @lostella)
    • #1909 - Fix the use of the scaling parameter in Transformer model (by @StanislasGuinel)
    • #1916 - Fix AddTimeFeatures transformation for multiples of base frequencies (by @lostella)
    • #1920 - Fix: use broadcast_lesser in place of comparisons in ISQF (by @vincentqb)
    • #1931 - Fix dummy estimator (by @canerturkmen)
    • #1933 - Fix Pytorch Lightning tutorial. (by @jaheba)
    • #1938 - Fixed autograd inplace operations error in Transformed Distribution (by @shubhamkapoor)
    • #1950 - Fix: Hard threshold positive distribution parameters (by @lostella)
    • #1952 - Fix forecast keys (quantiles) output by TemporalFusionTransformer (by @lostella)
    • #1968 - Fix: use of num_parallel_samples in deepAR (by @kashif)
    • #1969 - Fix: torch DeepAR observed indicator in multivariate case (by @kashif)
    • #1975 - use FieldName (by @kashif)
    • #1983 - Documentation: add docstrings for torch-based models (by @lostella)
    • #1986 - Fix OffsetSplitter for negative offsets (by @lostella)
    • #1989 - Pin protobuf version. (by @jaheba)
    • #1991 - Remove packaged pytorch-ts from gluonts.nursery.SCott (by @lostella)
    • #1999 - Documentation: fix and speed up tutorials (by @lostella)
    • #2004 - Refactor splitter assertion and add error message (by @RSNirwan)
    • #2005 - Rework itertools, add col-to-row and row-to-col functions. (by @jaheba)
    • #2008 - Re-add cache for parsing 'pd.Period'. (by @jaheba)
    • #2013 - Update website template, clean up homepage and tutorials (by @lostella)
    • #2014 - Expose Estimator, Predictor, Forecast in gluonts.model. (by @jaheba)
    • #2015 - Fix mean in AffineTransformedDistribution (by @stailx)
    • #2016 - Fix torch affine transformed distribution (by @lostella)
    • #2020 - Remove unnecessary files from docs folder, update gitignore (by @lostella)
    • #2021 - Update references to dev branch. (by @lostella)
    • #2024 - Fix README. Use DataFramesDataset. (by @jaheba)
    • #2025 - Make HP tuning tutorial more accurate (by @jaheba)
    • #2028 - Re-add support for Python 3.6 (by @jaheba)
    • #2029 - Add support for nan values in Rotbaum (by @zoolhasson)
    • #2035 - Simplify lag values computation in torch DeepAR (by @lostella)
    • #2036 - Minor improvements to the hierarchical model (by @rshyamsundar)
    • #2047 - Make Quantile derive from pydantic.BaseModel. (by @jaheba)
    • #2050 - Add concepts section to docs. (by @jaheba)
    • #2051 - Add tutorial on DataFramesDataset (by @RSNirwan)
    • #2057 - Add optional parameter time_axis to forecast_start. (by @melopeo)
    • #2062 - Fix type annotations for predict_to_numpy (by @lostella)
    • #2066 - Always pass freq explicitly to pd.period_range. (by @kashif)
    • #2068 - Docs: simplify call to evaluator (by @lostella)
    • #2092 - Fix: DistributionLoss not encodable. (by @jaheba)
    • #2098 - Add Airtraffic dataset. (by @jaheba)
    • #2108 - Fixup trainer in case of non-finite loss. (by @jaheba)
    • #2121 - Change default behavior for TrainDatasets overwrite (by @nklingen)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.6(Jun 30, 2022)

  • v0.10.0rc1(Jun 24, 2022)

    Overview

    Arrow based datasets

    We have added support for Parquet-files, as well as Arrow's binary format. This is an opt-in feature, requiring pyarrow to be installed. Use pip install 'gluonts[pro]' or pip install 'gluonts[arrow]' to ensure the correct version is installed.

    FileDataset has been reworked to support .parquet and .arrow files. Previously, it had assumed all files to use jsonlines. To continue using jsonlines ensure that the the files use one of the .json, .jsonl, .json.gz, jsonl.gz suffixes.

    Depending on the dataset size and shape, Arrow can be much faster than the json variant. In more extreme cases we saw speedups of more than 100x when using arrow vs jsonlines (see #2003 for some examples).

    To convert a given dataset into arrow, you can use the gluonts.dataset.arrow utility:

    python -m gluonts.dataset.arrow write </path/to/dataset> my-dataset.arrow
    

    PandasDataset

    We have added support for pandas.DataFrame and pandas.Series as well. You can now directly model data given in a DataFrame using gluonts.dataset.pandas.PandasDataset. In this tutorial we describe in depth how you can use PandasDataset to speed up modelling using GluonTS.

    Changelog

    New Features

    • #1631 - Add TimeLimitCallback to mx/trainer callbacks. (by @yx1215)
    • #1780 - adding MQF2 (Multi-horizon) (by @KelvinKan)
    • #1903 - Added QuarterlyBegin time feature (by @kashif)
    • #1924 - Porting SimpleFeedForwardEstimator to PyTorch (by @lostella)
    • #1925 - DeepAR PyTorch: make samplers configurable (by @lostella)
    • #1935 - added support for pandas dataframes (by @rsnirwan)
    • #1962 - Add support for beta-NLL loss (by @kashif)
    • #1982 - Add Uber-TLC dataset to dataset repository. (by @Hongqing-work)
    • #1990 - Add info cli. (by @jaheba)
    • #1987 - Add HP tuning example with Optuna (by @npnv)
    • #2000 - Add arrow-based dataset. (by @vafl, @lostella, @jaheba)
    • #2002 - add ND for item_metrics (by @melopeo)
    • #2006 - Added support of "long" RTS, making short RTS be "past_feat_dynamic_real" (by @zoolhasson)
    • #2061 - Add DatasetWriter. (by @jaheba)
    • #2074 - Add support for second frequency. (by @kashif)

    Breaking Changes

    • #1917 - Breaking: Fix return types of features (by @lostella)
    • #1941 - Breaking: Update dependency fbprophet -> prophet (by @lostella)
    • #1946 - Breaking: Split incremental quantile output into separate class (by @lostella)
    • #1965 - Breaking: reorg torch package, shorten import paths (by @lostella)
    • #1980 - Use pd.Period instead of pd.Timestamp. (by @jaheba)
    • #1997 - Remove freq argument from Forecast. (by @kashif)
    • #2011 - Remove dct_reduce. (by @jaheba)
    • #2018 - Remove multiprocessing dataloader. (by @jaheba)
    • #2019 - Rework FileDataset. (by @jaheba)
    • #2053 - Add dataset_writer to get_dataset. (by @Hongqing-work)
    • #2070 - Add jsonl.encode_json, remove serialize_data_entry. (by @jaheba)

    Bug Fixes / Minor Improvements

    • #1704 - Settings._let will pop element it added instead of just the last one. (by @jaheba)
    • #1905 - Fix typing issues in torch estimators, update base estimators docstrings (by @lostella)
    • #1909 - Fix the use of the scaling parameter in Transformer model (by @StanislasGuinel)
    • #1916 - Fix AddTimeFeatures transformation for multiples of base frequencies (by @lostella)
    • #1920 - Fix: use broadcast_lesser in place of comparisons in ISQF (by @vincentqb)
    • #1931 - Fix dummy estimator (by @canerturkmen)
    • #1933 - Fix Pytorch Lightning tutorial. (by @jaheba)
    • #1938 - Fixed autograd inplace operations error in Transformed Distribution (by @shubhamkapoor)
    • #1950 - Fix: Hard threshold positive distribution parameters (by @lostella)
    • #1952 - Fix forecast keys (quantiles) output by TemporalFusionTransformer (by @lostella)
    • #1968 - Fix: use of num_parallel_samples in deepAR (by @kashif)
    • #1969 - Fix: torch DeepAR observed indicator in multivariate case (by @kashif)
    • #1975 - use FieldName (by @kashif)
    • #1983 - Documentation: add docstrings for torch-based models (by @lostella)
    • #1986 - Fix OffsetSplitter for negative offsets (by @lostella)
    • #1989 - Pin protobuf version. (by @jaheba)
    • #1991 - Remove packaged pytorch-ts from gluonts.nursery.SCott (by @lostella)
    • #1999 - Documentation: fix and speed up tutorials (by @lostella)
    • #2004 - Refactor splitter assertion and add error message (by @RSNirwan)
    • #2005 - Rework itertools, add col-to-row and row-to-col functions. (by @jaheba)
    • #2008 - Re-add cache for parsing 'pd.Period'. (by @jaheba)
    • #2013 - Update website template, clean up homepage and tutorials (by @lostella)
    • #2014 - Expose Estimator, Predictor, Forecast in gluonts.model. (by @jaheba)
    • #2015 - Fix mean in AffineTransformedDistribution (by @stailx)
    • #2016 - Fix torch affine transformed distribution (by @lostella)
    • #2020 - Remove unnecessary files from docs folder, update gitignore (by @lostella)
    • #2021 - Update references to dev branch. (by @lostella)
    • #2024 - Fix README. Use DataFramesDataset. (by @jaheba)
    • #2025 - Make HP tuning tutorial more accurate (by @jaheba)
    • #2028 - Re-add support for Python 3.6 (by @jaheba)
    • #2029 - Add support for nan values in Rotbaum (by @zoolhasson)
    • #2035 - Simplify lag values computation in torch DeepAR (by @lostella)
    • #2036 - Minor improvements to the hierarchical model (by @rshyamsundar)
    • #2047 - Make Quantile derive from pydantic.BaseModel. (by @jaheba)
    • #2050 - Add concepts section to docs. (by @jaheba)
    • #2051 - Add tutorial on DataFramesDataset (by @RSNirwan)
    • #2057 - Add optional parameter time_axis to forecast_start. (by @melopeo)
    • #2062 - Fix type annotations for predict_to_numpy (by @lostella)
    • #2068 - Docs: simplify call to evaluator (by @lostella)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.5(Jun 14, 2022)

    • Re-add support for Python 3.6 in v0.9.x. (#2032 by @jaheba)

    Backporting fixes:

    • Fix: use of num_parallel_samples in deepAR (#1968 by @kashif)
    • Fix: torch DeepAR observed indicator in multivariate case (#1969 by @kashif)
    • Fix OffsetSplitter for negative offsets (#1986 by @lostella)
    • Fix mean in AffineTransformedDistribution (#2015 by @stailx)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.4(Apr 28, 2022)

    Backporting fixes:

    • Fix: Hard threshold positive distribution parameters (#1950 by @lostella)
    • Fix forecast keys (quantiles) output by TemporalFusionTransformer (#1952 by @lostella)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.3(Apr 12, 2022)

    Backporting fixes:

    • Fix: use broadcast_lesser in place of comparisons in ISQF (#1920 by @vincentqb)
    • Fix dummy estimator (#1931 by @canerturkmen)
    • Fix Pytorch Lightning tutorial (#1933 by @jaheba)
    • Fixed autograd inplace operations error in Transformed Distribution (#1938 by @shubhamkapoor)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.2(Mar 16, 2022)

    Backporting fixes:

    • Fix AddTimeFeatures transformation for multiples of base frequencies (https://github.com/awslabs/gluon-ts/pull/1916)
    • Update docs requirements (https://github.com/awslabs/gluon-ts/pull/1919)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.1(Feb 28, 2022)

  • v0.9.0(Feb 18, 2022)

    Changelog

    New Features

    • Add ckpt_path argument to PyTorchLightningEstimator. (#1872)
    • Add TSBench (#1865)
    • add SCott code to nursery (#1827)
    • Add dynamic code for shell. (#1821)
    • Adding torch.isqf (#1815)
    • Add tsbench readme placeholder (#1808)
    • Adding ISQF distribution class (#1746)
    • Adding IQF to remove quantile crossing and required retraining for ne… (#1693)
    • Hierarchical Forecaster: End-to-End model based on DeepVAR (#1665)
    • Adding glouonts.torch.piecewise_linear (#1663)
    • Add quantitle regression mode to AutoGluon-based TabularEstimator (#1611)
    • add dummy estimator to trivial models (#1602)

    Bug Fixes

    • Add file path argument to m5 dataset generation (#1896)
    • Fix negative binomial parameter map (#1893)
    • Fix negative binomial sampling (#1884)
    • Fixes for Monash Forecasting Repository datasets (#1879)
    • Fix serde.flat type handling. (#1851)
    • Fix datesplitter. (#1850)
    • changed metadata creation function (#1847)
    • Check equality of transformations. (#1844)
    • Fix samples scaling in PyTorch DeepAR (#1836)
    • Fix _version for cases when git is not installed. (#1825)
    • Fixed data leakage bug in implementation of dynamic real and categorical features (#1809)
    • fix for #1725, reverse breaking changes to data loader and handle all zero batches (#1779)
    • Upgrade pytorch and pytorch-lightning requirements and some fixes. (#1765)
    • Fix torch NOPScaler shape. (#1752)
    • Convert batchify list to np array (#1732)
    • Fix gluonts.json; added bdump/bdumps. (#1721)
    • Fix scaling for pytorch negative binomial output (#1702)
    • Fix frequency string conversion from ts format, add test (#1652)
    • Fix NegativeBinomial constructor args in NegativeBinomialOutput (torch) (#1651)
    • Add batch_size attribute to MQCNNEstimator and MQRNNEstimator (#1645)
    • Add additional datasets from the Monash Time Series Forecasting Repository (#1632)

    Breaking Changes

    • Extend default quantiles for MQ* Estimators to match MSIS quantiles. (#1866)
    • changed metadata creation function (#1847)
    • Remove support module. (#1792)
    • Set minimum Python version to 3.7. (#1791)
    • Exceptions cleanup. (#1615)

    Other Changes & Improvements

    • Update mypy to 0.910. (#1875)
    • Bump ujson from 4.3.0 to 5.1.0 in /src/gluonts/nursery/tsbench (#1869)
    • Update black to v22. (#1867)
    • Fix docstring typo in feature.py (#1863)
    • Fix scott checks. (#1845)
    • Remove requirement for @validated in from_hyperparameters. (#1826)
    • Fix test collect ignore. (#1817)
    • Split tests into one workflow for each framework. (#1805)
    • Mark transformer as flaky. (#1801)
    • Mark empirical_distribution test as flaky. (#1798)
    • Use of int/float/object over np.int/float/object for dtype. (#1795)
    • Rework tests. (#1786)
    • Update typing_extension version. (#1785)
    • Use of independent random seed. (#1767)
    • Upgrade pytorch and pytorch-lightning requirements and some fixes. (#1765)
    • Remove sphinx-autobuild sphinx-autorun, update sphinx version. (#1745)
    • Exlude bin folders from apidoc. (#1744)
    • Don't run doctest on nursery. (#1743)
    • Hierarchical: Compute relative reconciliation error and add tests (#1722)
    • Fixing doc build from mqcnn-iqf commit (#1699)
    • Replace miniver with custom versioning code. (#1662)
    • Cap numba<0.54, ipykernel<6.2.0 (#1661)
    • Removed assert for cardinality and static feats (#1659)
    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Aug 12, 2021)

    Backporting fixes:

    • loosen RTOL in test/distribution/test_flows.py to make test_flow_invertibility pass (#1604)
    • Add batch_size attribute to MQCNNEstimator and MQRNNEstimator (#1645)
    • Fix NegativeBinomial constructor args in NegativeBinomialOutput (torch) (#1651)
    • Fix frequency string conversion from ts format, add test (adapted from #1652)
    Source code(tar.gz)
    Source code(zip)
Owner
Amazon Web Services - Labs
AWS Labs
Amazon Web Services - Labs
Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models.

Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow scikit-learn's functionality wit

Soledad Galli 33 Dec 27, 2022
A simple python program which predicts the success of a movie based on it's type, actor, actress and director

Movie-Success-Prediction A simple python program which predicts the success of a movie based on it's type, actor, actress and director. The program us

Mahalinga Prasad R N 1 Dec 17, 2021
This is the code repository for Interpretable Machine Learning with Python, published by Packt.

Interpretable Machine Learning with Python, published by Packt

Packt 299 Jan 02, 2023
A collection of Machine Learning Models To Web Api which are built on open source technologies/frameworks like Django, Flask.

Author Ibrahim Koné From-Machine-Learning-Models-To-WebAPI A collection of Machine Learning Models To Web Api which are built on open source technolog

Ibrahim Koné 2 May 24, 2022
pywFM is a Python wrapper for Steffen Rendle's factorization machines library libFM

pywFM pywFM is a Python wrapper for Steffen Rendle's libFM. libFM is a Factorization Machine library: Factorization machines (FM) are a generic approa

João Ferreira Loff 251 Sep 23, 2022
A Python implementation of GRAIL, a generic framework to learn compact time series representations.

GRAIL A Python implementation of GRAIL, a generic framework to learn compact time series representations. Requirements Python 3.6+ numpy scipy tslearn

3 Nov 24, 2021
Data science, Data manipulation and Machine learning package.

duality Data science, Data manipulation and Machine learning package. Use permitted according to the terms of use and conditions set by the attached l

David Kundih 3 Oct 19, 2022
TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models

538 Jan 01, 2023
Continuously evaluated, functional, incremental, time-series forecasting

timemachines Autonomous, univariate, k-step ahead time-series forecasting functions assigned Elo ratings You can: Use some of the functionality of a s

Peter Cotton 343 Jan 04, 2023
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Machine Learning Notebooks, 3rd edition This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code

Aurélien Geron 1.6k Jan 05, 2023
A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and the A* Search (using the Manhattan Distance Heuristic)

17 Aug 14, 2022
Built on python (Mathematical straight fit line coordinates error predictor machine learning foundational model)

Sum-Square_Error-Business-Analytical-Tool- Built on python (Mathematical straight fit line coordinates error predictor machine learning foundational m

om Podey 1 Dec 03, 2021
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 07, 2023
A model to predict steering torque fully end-to-end

torque_model The torque model is a spiritual successor to op-smart-torque, which was a project to train a neural network to control a car's steering f

Shane Smiskol 4 Jun 03, 2022
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Thoughtworks 318 Jan 02, 2023
K-Means clusternig example with Python and Scikit-learn

Unsupervised-Machine-Learning Flat Clustering K-Means clusternig example with Python and Scikit-learn Flat clustering Clustering algorithms group a se

Emin 1 Dec 13, 2021
Toolkit for building machine learning models that generalize to unseen domains and are robust to privacy and other attacks.

Toolkit for Building Robust ML models that generalize to unseen domains (RobustDG) Divyat Mahajan, Shruti Tople, Amit Sharma Privacy & Causal Learning

Microsoft 149 Jan 06, 2023
ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

Broad Institute 65 Dec 20, 2022
A library of sklearn compatible categorical variable encoders

Categorical Encoding Methods A set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques

2.1k Jan 07, 2023
Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validat

The Apache Software Foundation 121 Dec 28, 2022