To bring forecasting to life, our objective in this chapter is to provide readers with a short and simple example of how to apply and interpret a statistical forecasting method in practice. This example is stylized and illustrative only—in reality, forecasting is messier, but it serves as a guideline of how to think about and successfully execute a forecasting method and apply it for decision making.
3.1. Point Forecasts
Suppose you are interested in predicting weekly demand for a particular product. Specifically, you have historical data from the past 50 weeks and you want to get a forecast for how demand for your product will develop over the next 10 weeks, that is, weeks 51 to 60 (see Figure 3.1). Completing this task means making 10 different forecasts; the one-step-ahead forecast is your forecast for week 51; the two-step-ahead forecast is your forecast for week 52; and so forth. You can assume that there is no trend and no seasonality in the data, just to keep things simple. Trends and seasonality are predictable patterns in the data, and we will examine them in more detail in Chapter 5.1 Further, you have no additional information on the market available beyond this history of demand (Chapter 8 will examine how to use additional information to achieve better predictions). A time series plot of your data is shown in Figure 3.1.
Figure 3.1 Time series for our example
Two simple approaches to forecasting would be to create a point forecast by either taking the most recent observation of demand (= 3370) or calculating the long-run average over all available data points (= 2444). Both approaches tend to ignore the distinct shape of the data of the series; note that the time series starts at a somewhat low level and then exhibits an upward shift. Calculating the long-run average ignores the observation that the time series currently seems to hover at a higher level than in the past. Conversely, taking only the most recent demand observation as your forecast ignores the fact that the last observation is very close to an all-time high and that historically an all-time high has usually not signaled a persistent upward shift, but has been followed by lower demands in the weeks immediately following—a simple case of regression to the mean. You could, of course, simply split the data and calculate the average demand over the last 15 periods (= 2652). Not a bad approach—but the choice of how far back to go (= 15 periods) seems ad hoc. Why should the last 15 observations receive equal weights and those before receive no weight at all? Further, this approach does not fully capture the possibility that you could encounter a shift in the data; going further back into the past allows your forecast—particularly your prediction intervals—to reflect the uncertainty of further shifts in the level of the series occurring.
A weighted average over all available demand data would solve these issues, with more recent data receiving more weight than older data. What would be good weights, and if I have 50 time periods of past data, do I really need to specify 50 different weights? It turns out that there is a simple method to create such weighted averages, which also turns out to be the method to use for this kind of time series: single exponential smoothing. Chapter 6 will provide more details on this method; for now, understand that a key aspect of this method is that it uses a so-called smoothing parameter α (alpha). The higher α, the higher the weight given to recent as opposed to earlier data when calculating a forecast. The lower α, the more the forecast takes the whole demand history into account without heavily discounting older data. In this case, if you feed the 50 time periods into an exponential smoothing model to estimate an optimal smoothing parameter, you will obtain an α value of 0.16—indicating that there is some degree of instability and change in the data (which we clearly see in Figure 3.1), but also some noise; one-step-ahead forecasts under exponential smoothing become a weighted average between most recently observed demand (weight = 0.16) and the most recent forecast made by the method (weight =1−0.16 = 0.84).
Now you are probably thinking: Wait! I have to create a weighted average between two numbers, where one of these numbers is my most recent forecast. But I have not made any forecasts yet! If the method assumes a current forecast and I have not made forecasts in the past using this method, what do I do? The answer is simple. You use the most recent fitted forecasts. In other words, suppose you had started in week 2 with a naïve forecast of your most recent demand and applied the exponential smoothing method ever since to create forecasts, what would have been your most recent forecast for week 50? Figure 3.2 provides an overview of how these fitted forecasts would have looked like.
Figure 3.2 Demand forecasts for our example
Note that these fitted forecasts are not real forecasts; you used the data of the last 50 weeks to estimate a smoothing parameter. You then used this parameter in turn to generate fitted forecasts for the last 50 periods. You would not have known this smoothing parameter in week 1, since you did not have enough data available to estimate it. Thus, the recorded fitted forecast for week 2, supposedly calculated in week 1, would not have been possible to make in week 1. Nevertheless, this method of generating fitted forecasts now allows you to make a forecast for week 51. Specifically, you take your most recent demand observation (= 3370) and your most recent “fitted” forecast (=2643) and calculate the weighted average of these two numbers (0.16 × 3370 + 0.84 × 2643 = 2759) as your point forecast for week 51.
So, what about the remaining nine point forecasts for periods 52 to 60? The answer is surprisingly simple. The single exponential smoothing model is a “level only” model, which assumes that there is no trend or seasonality (which, as mentioned initially, is correct for this time series, since it was constructed with neither trend nor seasonality). Without such additional time series components (see Chapter 5), the point forecast for the time series remains flat, that is, the same number. Moving further out in the time horizon may influence the spread of our forecast probability distribution, but in this case, the center of that distribution—and therefore the point forecast—remains the same. Our forecasts for the next 10 weeks are thus a flat line of 2759 for weeks 51 to 60. Your two-step-ahead, three-step-ahead, and later point forecasts are equal to your one-step-ahead point forecast. Your intuition may tell you that this is odd; indeed, intuitively, many believe that the form of the time series of forecasts should be similar to the form of the time series of demand (Harvey 1995). Yet this is one of those instances where our intuition fails us; the time series contains noise, which is the unpredictable component of demand. A good forecast filters out this noise (an aspect that is clearly visible in Figure 3.2). Thus, the time series of forecasts becomes less variable than the actual time series of demand. In our case, there simply is no information available to tell us whether the time series will shift up, shift down, or stay at the same level. In the absence of such information, our most likely prediction of demand falls into the center—that is, to a prediction that the time series stays at the same level. Thus, the point forecast remains constant as we predict further into the future. Only the uncertainty associated with this forecast may change as our predictions reach further out into the unknown.
3.2. Prediction Intervals
So how to calculate a prediction interval associated with our point forecasts? We will use this opportunity to explore three different methods of calculating prediction intervals: (1) using the standard deviation of observed forecast errors, (2) using the empirical distribution of forecast errors, and (3) using theory-based formulas.
The first method is the simplest and most intuitive one. It is easy to calculate the errors associated with our past “fitted” forecasts by calculating the difference between actual demand and fitted forecasts in each period. Note that if we take the average of the absolute values (also called mean absolute error, or MAE) or of the squared values (also called mean squared error, or MSE) of these forecast errors, we can calculate standard measures of forecast accuracy (see Chapter 11). For our purposes, we can calculate the population standard deviation of these forecast errors (σ = 464.63).2 This value represents the spread of possible outcomes around our point forecast (see Figure 2.1). If we want to, for example, calculate an 80 percent prediction interval around the point forecast, we have to figure out the z-score associated with the lowest 10 percent of the probability distribution—10 percent, because an 80 percent interval around the point forecast excludes the lower and upper 10 percent of the distribution. The corresponding z-score is −1.28/+1.28, and we thus subtract 1.28 and add 1.28 times our standard deviation estimate (= 464.63) from or to the point forecast (= 2752) to obtain an 80 percent prediction interval of (2164; 3354). In other words, we can be 80 percent sure that demand in week 51 lies in between 2164 and 3354, with the most likely outcome being 2759.
Method (1) assumes that our forecast errors roughly follow a normal distribution; method (2) does not make this assumption, but generally requires more data to be effective. Consider that an 80 percent prediction interval ignores the top and the bottom 10 percent of errors that could be made. Since we have approximately 50 “fitted” forecast errors, ignoring the top and the bottom 10 percent of our errors roughly equates to ignoring the top and the bottom five (=10% × 50) errors in our data. In our data, the errors that fall just within these boundaries are (–434; 581), so another simple way of creating an 80 percent prediction interval is to simply add and subtract these two extreme errors from and to the point forecast to generate a prediction interval of (2325; 3340), which is a bit more narrow than our previously calculated interval from method (1). In practice, bootstrapping techniques exist that increase the effectiveness of this method in terms of creating prediction intervals.
The final method (3) only exists for forecasting methods that have an underlying theoretical model.3 A key to estimate uncertainty in this method is that if a theoretical model underlies the method, one can theoretically calculate the standard deviation of the forecast error; in the case of single exponential smoothing, this formula has a comparatively simple form: If one predicts h time periods into the future, the underlying standard deviation of the forecast error can be calculated by taking the one-period-ahead forecast error standard deviation as calculated in method (1) and multiplying it by 1+(h–1) × α2. This is a relatively simple formula; similar formulas exist for other methods but can be more complex.
This leads to an interesting discussion. How does one calculate the standard deviation of forecasts that are not one-step-ahead but h steps ahead? If we prepare a forecast in week 50 for week 52 and if we use the standard deviation of forecast errors for one-step-ahead forecasts resulting from method (1), our estimated forecast uncertainty will generally underestimate the true uncertainty in this two-step-ahead forecast. Unless a time series is entirely stable and has no changing components, predicting further into the future implies an increasing likelihood that the time series will move to a different level as we forecast further out in time. The formula mentioned in method (3) would therefore see the standard deviation of the one-step-ahead forecast error for period 51 (= 464.63) increase to 476.52 for the two-step-ahead forecast error for week 52 and to 488.42 for the three-step-ahead forecast error for week 53. Prediction intervals increase accordingly.
Another approach would be to calculate the two-step-ahead fitted forecasts for all past demands. In the case of single exponential smoothing, the one-step-ahead forecast is equal to the two-step-ahead forecast, but error calculation now requires comparing this forecast to demand one period later. The resulting error standard deviations for the two-step-ahead forecast (= 478.04) and three-step-ahead forecast (= 489.34) are slightly higher than those for the one-step-ahead forecast and the adjusted standard deviations using the formula from method (3).
Forecasting software will usually provide one or more of these methods to calculate prediction intervals and will allow visualizing them. For instance, Figure 3.3 shows point forecasts and 70 percent prediction intervals from method (3).
Figure 3.3 Demand, point forecasts, and prediction intervals
So what is the right method to use? How do we come up with the right prediction interval? Which standard deviation estimate works best? The common criticism against method (1) is that it underestimates the uncertainty about the true forecasting model and does not incorporate the notion that the model could change. Standard errors are thus too low. Method (2) requires a lot of data to be effective. Method (3) is usually based on some assumptions and may not be robust if these assumptions are not met. In practice, getting the right method to calculate prediction intervals can require careful calibration and selection from different methods. Chapter 11 outlines how the accuracy of prediction intervals can be assessed and, therefore, how different methods can be compared in practice. In general, though, using any of the methods described is better than using no method at all; while getting good estimates of the underlying uncertainty of a forecast can be a challenging task, any such estimate is better than assuming that there is no uncertainty in the forecast, or supposing that all forecasts have the same inherent uncertainty.
3.3. Decision Making
Given that we now understand how to estimate the parameters of a probability distribution of future demand, how would we proceed with decision making? Suppose our objective is to place an order right now, which has a lead time of 2 weeks; suppose also that we place an order every week, so that the order we place right now has to cover the demand we face in week 53; further, suppose we have perishable inventory, so that we do not need to worry about existing inventories left over from week 52 and that the inventory we have available to meet the demand in week 53 is equal to what we order right now. In this case, we would take the three-week-ahead forecast (= 2759) together with the three-week-ahead standard deviation (= 488.42) to provide a demand forecast. It is important to match the right forecast (three-week-ahead) with the right forecast uncertainty measure (three-week-ahead, and not one-week-ahead, in this case). Suppose we want to satisfy an 85 percent service level; the z-score associated with an 85 percent service level is 1.04. Thus, we would place an order for 2759 + 1.04 × 488.42 ≈ 3267 units; this order quantity would, according to our forecast, have an 85 percent chance of meeting all demand.
This concludes our simple example; key learning points for readers are to understand the mechanics of creating point forecasts and prediction intervals for time periods in the future and how these predictions can translate into decisions. We will now proceed to provide more detailed explanations of different forecasting methods.
______________
1Note that the time series in Figure 3.1 appears as if it contains a positive trend, since the points on the right-hand side of the graph are higher than the points on the left-hand side. This is sometimes referred to as illusionary trend perception (Kremer, Moritz, and Siemsen 2011). The data series here was artificially created using a random walk with noise, and the shifts in demand inherent in such a random walk can make us believe in an illusionary trend in the data.
2There is a relationship between MSE and σ. If the forecasting method is unbiased, that is, the average forecast error is 0, then σ is equivalent to the square root of MSE (also called the root mean square error, or RMSE). If there is a bias in the forecasting method, then σ2 + bias2 is equivalent to the MSE.
3Note that any way of coming up with a forecast is a forecasting method; a forecasting model exists only if this forecasting method is derived from a theoretical model that specifies the statistical process generating the data. Such model-based forecasting methods include the exponential smoothing methods discussed in Chapter 6 and the ARIMA methods discussed in Chapter 7. Croston’s method, which is described in Chapter 9, is an example of a forecasting method without a model.
3.135.202.122