Forecasting – theory of operation

The first thing to realize is that the act of invoking a forecast on data is that it is an extension of an existing job. In other words, you need to have an ML job configured and that job needs to have analyzed historical data before you can forecast on that data. This is because the forecasting process uses the models that are created by the ML job; the same ones that are used for anomaly detection. To forecast, you need to follow the same steps to create an ML job that has been described in other chapters. If anomalies were generated by the execution of that job, you can disregard them if your only purpose is to execute forecasting. Once the ML job has learned on some historical data, the model or models (if the ML job contains more than one time series) associated with that job are current and up to date, as represented by the following diagram:

We'll consider the time before now as historical learning; the time over which the models have learned on actual data. When the user wishes to invoke a forecast at a particular time, a copy of the models is created, and a separate process is used to take those models and extrapolate them into the "future". This process is run in parallel to not interfere with the original models and their evolution. This is represented in the following diagram:

The forecast values are written to the results index as a special type of result (more detail on this later) and will be available for viewing in the UI or accessible via the API.

It's important to note that the normal path of the ML job analyzing the actual real data will continue (if it is running in real time) and therefore after an amount of time there could be a difference between the predicted value for a future time (made at the time of the forecast) and the actual value when that time arrives, as shown in the following diagram:

This forecasting error is to be expected, but hopefully it will be minimal. The differential between the two is not currently used by ML, but perhaps in the future it could inform the models about more accurate subsequent forecasts. Surely it is also possible that an unknown external factor (as described earlier) could lead to a certain amount of forecasting error. Another (perhaps simpler) way to think about uncertainty in predictions is to think about predicting the outcome of a coin toss. You could observe a sequence of prior coin flips, but if you are not taking into account the physics of the coin flip (speed, height, rotations, and so on) and are only relying on the outcome of past observations, then you'll never get better than a 50/50 prediction on the outcome. Additionally, it is likely the ML didn't see behaviorally perfectly consistent data during the learning period. As such, with a certain amount of noise in the data, we should also expect a certain amount of variation or uncertainty in the forecast. 

There also can be multiple forecasts made by the user at other times. These will be stored separately, as represented by the following diagram:

The distinction between forecast #1 and forecast #2 will be keyed off with an internal unique ID for each forecast instance. This will become apparent later when we look at how the forecast results are stored in the index.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.157.248