So far, we considered many different forecasting methods, from purely statistical ones to judgmental ones. And, of course, we have only scratched the surface in this book; there are many more established forecasting methods. In addition, if you want to implement a forecasting system, each software vendor will have his/her own variants on these basic methods, as well as home-grown, possibly proprietary algorithms. How do we decide which of these forecasting methods are the best for our data?
To decide between many different forecasting methods, we can run a so-called forecasting competition, in which different methods compete to provide the best forecast. Such forecasting competitions are not entirely straightforward to run. We therefore devote an entire chapter to describing the generally accepted best way of running them.
12.1 Planning
Similar to a randomized controlled trial in the field of medicine, a forecasting competition needs to be specified and planned in advance. Changing parameters in mid-competition may be problematic. Including additional data after the first data delivery will likely lead to a larger data-cleansing effort than if all the data had been provided at once. Changing accuracy measures may mean that models change dramatically, possibly invalidating earlier work.
Think about the decision your forecast is supporting. Do you want it for planning marketing and promotions? Or for short-range ordering and replenishment decisions? This will have an impact on the data you will need, in particular in terms of granularity along the product, time, and location dimensions (see Chapter 14), as well as on the forecast horizon and on what forecast accuracy measures (see Chapter 11) make sense. You may need forecasts for different processes, in which case you may need to think about how you want to deal with aggregation and forecast hierarchies (again, see Chapter 14).
Plan for an iterated process. If the people that actually perform the forecasting are already intimately familiar with your data—for instance, because they belong to an established forecasting group within your business and have indeed been forecasting your time series for years—they may already have a very good understanding of your data, its idiosyncrasies, and your processes. However, if you perform the forecasting competition on data from a new subsidiary, or if you involve external consultants or third-party forecasting software vendors, they will need to understand your data before they can calculate meaningful forecasts. This is best done via live discussions, either face-to-face or at least via web conferencing. Written data descriptions are always useful, but forecasting experts will always have additional questions. After all, you would not expect your doctor to provide a reliable diagnosis based on a written description of your symptoms alone either!
As discussed in Chapter 11, there are numerous forecast accuracy measures. Invest some time in choosing good ones to use for comparison. Certain measures may be unusable for your data, like MAPE for data with zeros, or may lead to biased forecasts, like MAE for low-volume data. If you use multiple series on different scales, make sure your accuracy measures can meaningfully be compared and summarized across scales, by using percentage or scaled error measures. Do think about using more than one error measure—each performance index is better than others at detecting different problems in your forecasts. In any case, you should always assess forecasts on both bias and accuracy, as described in Chapter 11.
If your forecasting competition involves external parties, do not be afraid of involving them already in an early planning stage. After all, you are planning on trusting these people’s software and expertise with a mission-critical process in your business, so you should be able to trust that they know what they are doing. A dedicated forecasting software provider or forecasting consultant may well know more about forecasting and have more expertise with more different data sets than your in-house forecasting group, although your in-house experts will likely know your data better. Tap this external expertise. Discuss your business and your data with the vendor. Consider their suggestions about what kind of data to include in the forecasting competition. Of course, the vendor will be looking out for himself first, but that does not mean his/her proposals will be useless. In addition, this kind of early discussion allows you to gauge his expertise and his commitment. Try to get subject matter experts to participate in these discussions, not just salespeople.
12.2 Data
After you have thought about what you need the forecast for, you can start collecting data. Make the data set representative. If the forecast competition aims at identifying a method to forecast a small number of highly aggregated time series, perform the competition on a small number of highly aggregated time series. If you are looking for an automated method that can perform well on thousands of series, use a large data set with thousands of series. If you only give out 20 series, the forecasters will tune each model by hand, but that will not be possible in a production environment with thousands of series—so use more data to begin with.
Does your day-to-day data contain outliers or invalid periods? Make sure to include such series. Conversely, do you plan on forecasting only precleaned data? Then preclean the data in the same way. Do you have causal drivers, like promotions or weather? Then include these effects in the data set you provide. As in our example in Chapter 8, the causal effects may or may not improve the forecast, but you will not know if you do not try. Try to make the data set you use for the competition as representative as possible for the actual forecasting task you need for your business.
However, remember that, as noted in Chapter 8, causal models require forecasts of causal drivers. If your business depends heavily on weather, your demand forecast will need to rely on weather forecasts. If you run your forecasting competition using actual weather instead of the forecasted weather, you are pretending that you know the future weather perfectly, and your forecasts will be better than if you relied on weather forecasts instead. Your causal models will appear to perform better than they really will in a production environment. Thus, make sure to include forecasts of your causal drivers for the forecasting period. As a rule of thumb, a forecast prepared in a particular time period should only use the information available in that time period.
12.3 Procedure
In principle, the procedure of a forecasting competition is simple: collect data, calculate forecasts, and evaluate the forecasts. The data selection and collection has been discussed in previous chapters, so let us focus on the other steps here.
One key aspect in forecasting competitions is to hold back evaluation data. For instance, you could collect demand data from 3 years, give out the demands of the first 2 years, keeping the third year’s demands for evaluation, and require forecasts for this third year. If the forecaster knows the third year’s demands, he/she may very well succumb to the temptation to “snoop,” tweaking his/her forecasts until they perform best on the known evaluation sample—but of course that will not work in an actual production environment. Avoid the temptation to cheat, especially for external vendors.
This discussion leads us to a related aspect: why should we use separate evaluation data? Is not it enough to assess how well a forecasting method fits the historical data? If one method performs better in-sample than another method, should it not yield better forecasts, too? Unfortunately, this appealing logic does not work. As a rule of thumb, more complex models (e.g., a seasonal vs. a nonseasonal model, or a trend model vs. a nontrend model) will always yield better in-sample fits than simpler models. But beyond some optimal complexity, in-sample fit keeps on improving, while out-of-sample forecast accuracy starts deteriorating. The reason is that the more complex models start fitting to noise instead of capturing a signal. Figure 12.1 illustrates this, using the “hard to forecast” series from Figure 1.1 and giving MSEs in thousands. We fit four different models of increasing complexity to the first 12 months of data and forecast the last 6 months. As we see, the more complex the models are, as reflected in more and more flexible influences of time, the closer the in-sample fit mirrors historical data and the lower in-sample MSE becomes—but out-of-sample forecasts get worse and worse. Thus, in-sample fit is not a reliable guide to out-of-sample forecast accuracy, and we should never rely on in-sample accuracy to judge a forecasting method.
Figure 12.1 In-Sample and Out of Out-of-Sample performance for more complex models
Let us continue with the actual procedure of a forecasting competition. You can run either “single origin” or “rolling origin” forecasts. In a single-origin setting as in Figure 12.1, you might give out 12 months of data and require forecasts for the next 6 months, allowing you to assess forecasts on horizons between 1 and 6 months ahead. In a rolling-origin forecast competition, you would give out 12 months of data, require 6 months of forecasts. . . then give out the actual demand for one more month, this time requiring 5 months of forecasts . . . then give out an additional month’s demands . . . you get the idea. In such a rolling origin set-up, the forecast method can learn from each additional month of historical data and adapt. This more closely mimics real forecasting processes, which would also be repeated and iterated, adapting to new information. Plus, it gives you more forecasts to base your evaluation on. On the other hand, you need to organize multiple exchanges of forecast and actual data and keep careful track of a forecast’s “vintage”: Was a given forecast for June a one-step-ahead forecast based on data until May or a two-step-ahead forecast based on data through April? Overall, rolling-origin forecast competitions are more realistic, but require more effort, especially if you communicate with one or multiple external vendors—if your internal forecasting group runs the competition, rolling origins may simply require a few more lines of code.
Finally, in evaluating the forecasts of different methods, look at matters from different angles. Summarize forecasts per series (if you have more than one forecast per series), then summarize these summaries over all series—for instance, taking averages of errors. If you consider different forecasting horizons, look at how errors behave for one-step, two-step, and longer-range forecasts. Did you include series with outliers (in the historical or the future period) in your data set? Or series with strong seasonality or driven especially strongly by certain causal factors? If so, check how your forecasts performed on those. Do not only look at averages of errors over series, but also at forecasts that performed extremely badly. Two methods may yield similar errors on average, but one of the two methods may break down badly in certain circumstances. Such rare spectacular failures can badly erode users’ trust in a forecasting system, even if forecasts are good on average, so you should include forecasting methods’ robustness in your overall assessment. Finally, consider discussing results with the forecasters, whether internal or external—you may learn interesting things this way.
12.4 Additional Aspects
Of course, the main focus of our forecasting competition lies on forecasting and accuracy. However, as discussed above, forecasting does not happen in a vacuum. We want forecasts in the first place to support business decisions, like how many units of a product to produce. It is likely not the forecast itself but the final business decision (how much to produce) that impacts the bottom line. Good forecasts do not earn or save money—good inventories do. And depending on subsequent decision-making processes, forecasts with very different accuracies may yield very similar impacts on the bottom line. For instance, if true demand is 600 and two methods yield forecasts of 700 and 900 units, respectively, then the first method is obviously more accurate. However, if your operations constrain you to produce in batches of 1,000, both forecasts would have led to the same business decision—to produce 1,000 units—the same service level and the same overstock; so for all practical purposes, spending additional funds to use the more accurate forecast would have been a waste. (However, if such situations occur frequently, you may want to try to make your production more flexible.) Thus, it makes sense to simulate the entire process, including both forecasting and subsequent decision making.
Finally, there are other aspects of a forecasting method to consider beyond forecasting accuracy. A method may be extremely accurate but require a lot of manual tuning by statistical experts, which can become expensive. Further, more complex forecasting methods can be more difficult to sell to other managers, and the forecasts resulting from these methods stand a higher chance of being ignored even if they are more accurate (Taylor and Thomas 1982). Or the method may simply take too much time to run—one technology may give you all your operational forecasts within seconds, while the other may need to run overnight. Faster forecasting allows more scenario planning, like changing causal variables and simulating the impact on demands. One forecasting software suite may be standalone and require expensive integration into your ERP system, while another one may already be integrated or may have well-defined interfaces that connect well with your existing databases.
12.5 Key Takeaways
• Plan out your forecasting competition ahead of time.
• Make sure your competition mirrors your real forecasting situation in terms of data and knowledge and in the decisions that must be supported by the forecast.
• Do not be afraid of involving external experts in setting up a forecasting competition. Of course, make sure not to be taken for a ride.
• Always hold back the evaluation data.
• Evaluate forecasts by slicing your data in different ways, to get an overall understanding of a forecasting method’s strengths and weaknesses.
• Forecast accuracy is not everything. Look at the larger picture, from decisions made based on the forecast to ease of integration of a forecasting software.
3.144.40.182