I'll be using an example here and generalize whenever needed, let's say we have the following dataset we train a timeseries predictor on:
1 ,A, 7, foo
2 ,A, 10, foo
3 ,A, 12, bar
4 ,A, 14, bar
2 ,B, 5, foo
4 ,B, 9, foo
In this case
target is what we are predicting,
gb is the column we are grouping on and we are ordering by
aux is an unrelated column that's not timeseries in nature and just used "normally".
We train a predictor with a window of
Then let's say we have an input stream that looks something like this:
time, gb, target, aux
6, A , 33, foo
7, A, 54, foo
First, we will need to store, for each value of the column
So, for example, if
n==1 we would save the last row in the data above, if
n==2 we would save both, whem new rows come in, we un-cache the older rows`.
Second, when a new datapoint comes into the input stream we'll need to "infer" that the prediction we have to make is actually for the "next" datapoint. Which is to say that when:
7, A, 54, foo comes in we need to infer that we need to actually make predictions for:
8, A, <this is what we are predicting>, foo
The challenge here is how do we infer that the next timestamp is
8, one simple way to do this is to just subtract from the previous record, but that's an issue for the first observation (since we don't have a previous record to substract from, unless we cache part of the training data) or we could add a feature to native, to either:
a) Provide a
delta argument for each group by (representing by how much we increment the order column[s])
b) Have an argument when doing timeseries prediction that tells it to predict for the "next" row and then do the inferences under the cover.
@paxcema let me know which of these features would be easy to implement in native, since you're now the resident timeseries expert.