2.1. Forecasting Methods
A forecast is an input to support decision making under uncertainty. Forecasts are created by a statistical model and/or by human judgment. A statistical model is nothing but an algorithm, often embedded into a spreadsheet model or other software, which converts data into a forecast. Of course, choosing which algorithm an organization uses for forecasting and how this algorithm is implemented often requires the use of human judgment as well. However, when we examine the role of human judgment in forecasting, we mean that judgment is used in lieu of or in combination with a statistical forecast. Human judgment is thus the intuition and cognition that decision makers can employ to convert all available data and tacit information into a forecast. Both statistical models and human judgment will be discussed in much more detail in the rest of this book. In reality, most forecasting processes contain elements of both. A statistical forecast may serve as the basis of discussion, but this forecast is then revised in some form or other through human judgment to arrive at a consensus forecast, that is, a combination of different existing forecasts (statistical or judgmental) within the organization. A recent survey of professional forecasters found that while 16 percent of forecasters relied exclusively on human judgment and 29 percent depended exclusively on statistical methods, the remaining 55 percent used either a combination of judgment and statistical forecast or a judgmentally adjusted statistical forecast (Fildes and Petropoulos 2015). Another study of a major pharmaceutical company found that more than 50 percent of the forecasting experts answering the survey did not rely on the company’s statistical models when preparing their forecast (Boulaksil and Franses 2009). Forecasting in practice is thus not an automated process but is strongly influenced by the people involved in the process.1
Throughout this book, we will refer to a forecasting method very generally as the process through which an organization generates a forecast. This forecast does not need to be the final consensus forecast, although the consensus forecast is generated by some method. The beauty of any forecasting method is that its accuracy can be judged ex post and compared against other methods. There is always an objective realization of demand that can and should be compared to create a picture of forecast accuracy over time. Of course, this comparison should never be based on small samples. However, if a forecasting method repeatedly creates forecasts that are far off from the actual demand realizations, this observation is evidence that the method does not work well, particularly if there also is evidence that another method works better. In other words, the question of whether a forecasting method is good or bad is not a question of belief but ultimately a question of scientific empirical comparison. If the forecastability of the underlying demand is challenging, then multiple methods will fail in improving accuracy. We will explore how to make such a forecasting comparison in more detail in Chapter 12.
One key distinction that we will emphasize throughout the book is that the forecast itself is not a target or a budget or a plan. A forecast is simply an expression or belief about the most likely state of the future.2 Targets, budgets, or plans are decisions based on forecasts, but these concepts should not be confused with the forecast itself. For example, our point forecast for demand for a particular item we want to sell on the market may be 100K units. However, it may make sense for us to set our sales representative a target of selling 110K units to motivate them to do their best. In addition, it may also make sense to plan for ordering 120K units from our contract manufacturer, since there is a chance that demand is higher and we want to balance the risk of stocking out against the risk of having excess inventory. To make the latter decision effectively, we would, of course, also include data on ordering and sales costs, as well as assumptions on how customers would react to stockouts and how quickly the product may or may not become obsolete.
2.2. Reporting Forecast Uncertainty
This leads to another important distinction. Most firms operate with point forecasts—single numbers that express the most likely outcome on the market. Yet, we all understand that such a notion is somewhat ridiculous; the point forecast is unlikely to be realized exactly. There can be immense uncertainty in a forecast. Reporting only a point forecast communicates an illusion of certainty. Let us recall a famous quote by Goethe: “to be uncertain is to be uncomfortable, but to be certain is to be ridiculous.” Point forecasts in that sense are misleading. It is much more useful and complete to report forecasts in the form of a probability distribution or at least in the form of prediction intervals, that is, best case, worst case, and most likely scenarios.
Creating such prediction intervals requires a measure of uncertainty in the forecast. While we will explore this topic in detail in Chapter 3, we provide a brief and stylized introduction here. Uncertainty is usually expressed as a standard deviation (usually abbreviated by the Greek letter σ). Given a history of past forecast errors, measuring this uncertainty is fairly straightforward—the simplest form would be to calculate the population standard deviation (= STDEV.P function in Excel) of past observed forecast errors (see Chapter 11 for a more in-depth treatment of measuring the accuracy of forecasts). Assuming that forecast errors are roughly symmetric,3 that is, overforecasting is as likely and extensive as underforecasting, we can then conceptualize the point forecast as the center (i.e., the mean, median, and most likely value, abbreviated by the Greek letter μ) of a probability distribution, with the standard deviation σ measuring the spread of that probability distribution.
These concepts are illustrated in Figure 2.1. Suppose our forecasting method generated a point forecast of 500. From past data, we calculate a standard deviation of our past forecast errors as 20. We can thus conceptualize our forecast as a probability distribution with a mean of 500 and a standard deviation of 20 (in this illustration, we assumed a normal distribution, drawn with the Excel function = NORM.DIST). A probability distribution is nothing but a function that maps possible outcomes to probabilities.4 For example, one could ask the question: What is the probability that demand is between 490 and 510? The area under the distribution curve between the X values of 490 and 510 would provide the answer—or alternatively the Excel function call = NORM.DIST(510;500;20;TRUE)-NORM.DIST(490;500;20;TRUE). The probability distribution thus communicates a good sense for the possible outcomes and the uncertainty associated with a forecast.
Figure 2.1 Forecasts from a probabilistic perspective
How should we report such a probabilistic forecast? Usually we do not draw the actual distribution, since this may be too much information to digest for decision makers. Reporting a standard deviation in addition to the point forecast can be difficult to interpret as well. A good practice is to report 95 percent or 80 percent prediction intervals, that is, intervals in which we are 95 or 80 percent sure that demand will fall.5 Assuming a normal distribution, calculating such intervals is relatively easy, since a 95 percent prediction interval means going approximately 2 standard deviations above and below the point forecast.6 In other words, if our point forecast is 500 and the standard deviation of our forecast errors is 20, a 95 percent prediction interval is approximately [500 – 2 × 20, 500 + 2 × 20]. These end points can then be referred to as best-case and worst-case scenarios. However, one has to realize that better-than-best or worse-than-worst-case scenarios are possible, since by definition only 95 percent of the density of the probability distribution fall into the prediction interval (as also shown in Figure 2.1).
Reporting and understanding prediction intervals requires some effort. However, their calculation and reporting can be largely automated in modern forecasting software. Not reporting (or ignoring) such intervals can have distinct disadvantages for organizational decision making. Reading a point forecast without a measure of uncertainty gives you no idea how much uncertainty there really is in the forecast. Prediction intervals provide a natural instrument for forecasters to communicate the uncertainty in their forecasts adequately and for decision makers to then decide how to best manage the risk inherent in their decision. The consequences of not making forecast uncertainty explicit can be dramatic. At best, decision makers will form their own judgment about how much uncertainty is inherent in the forecast. Since human judgment in this context generally suffers from overconfidence (Mannes and Moore 2013), not making forecast uncertainty explicit will likely lead to an underestimation of the inherent forecast uncertainty, leading to less-than-optimal safety stocks and buffers in resulting decisions. At worst, decision makers will treat the point forecast as a deterministic number and completely ignore the inherent uncertainty in the forecast, as well as any precautions they should take in their decisions to manage their demand risk.
Adopting prediction intervals in practice is challenging; one line of criticism often brought up is that it is difficult to report more than one number and that the wide range of a 95 percent prediction interval makes the interval almost meaningless. This line of reasoning, however, misinterprets the prediction interval. The range itself does contain information about probabilities as well, since values in the center of the range are more likely than values at the end. These differences can be easily visualized. The Bank of England (BoE) has moved their GDP growth and inflation forecasts completely away from point forecasts toward probability distributions and prediction intervals. An example of how the bank reports these in a fan plot is shown in Figure 2.2. The chart contains the BoE’s forecasts made at the end of the first quarter of 2014. While past data in the chart is represented by a time series, no point forecasts are reported. Instead, areas of different shading indicate different prediction intervals into which the actuals for the succeeding quarters are predicted to fall. Overall, Figure 2.2 is easy to interpret while summarizing a lot of information and represents the state of the art of communicating forecast uncertainty (Kreye et al. 2012).
Figure 2.2 GDP growth reporting by BoE
Source: www.bankofengland.co.uk; ONS represents the Office of National Statistics.
Another line of attack against using prediction intervals is that, in the end, single numbers are needed for decision making. Ultimately, containers need to be loaded with a certain volume; capacity levels require hiring a certain number of people or buying a certain number of machines. What good is a forecast that shows a range when in the end one number is needed? This argument makes the mistake of confusing the forecast with decision making. The forecast is an input into a decision, but not a decision per se. Firms can and must set service-level targets that translate probabilistic forecasts into single numbers.
2.3. Service Levels
Consider the following illustrative example. Suppose you are a baker and you need to decide how many bagels to bake in the morning for selling throughout the day. The variable cost to make bagels is 10 cents, and you sell them for $1.50. You donate all bagels that you do not sell during a day to a kitchen for the homeless. You estimate that demand for bagels for the day has a mean of 500 and a standard deviation of 80, giving you a 95 percent prediction interval of roughly (340, 660). How many bagels should you bake in the morning? Your point forecast is 500—but you probably realize that this would not be the right number. Baking 500 bagels would give you just a 50 percent chance of meeting all demand during the day.
This 50 percent chance represents an important concept in this decision context—the so-called type I service level or in-stock probability, that is, the likelihood of meeting all demands with your inventory.7 This chance of not encountering a stockout is a key metric often used in organizational decision making. What service level should you strive for? The answer to that question requires carefully comparing the implications of running out of stock with the implications of having leftover inventory—that is, managing the inherent demand risk. The key concepts here are so-called overage and underage cost, that is, assessing what happens when too much or too little inventory is available. An overage situation in the case of our bagel baker implies that he/she has made bagels at a cost of $0.10 that he/she is giving away for free; this would lead to an actual loss of $0.10. An underage situation implies that he/she has not made enough bagels and thus lose a profit margin of $1.40 per bagel not sold. This is an opportunity cost—a loss of $1.40. Assuming no other costs of a stockout (i.e., loss of cross-sales, loss of goodwill, loss of reputation, etc.), this $1.40 represents the underage cost in this situation. Obviously, underage costs in this case (=$1.40) are much higher than overage costs (=$0.10), implying that you would probably bake more than 500 bagels in the morning.
Finding the right service level in this context is known as the “newsvendor problem” in the academic literature. Its solution is simple and elegant. One calculates the so-called critical fractile, which is given by the ratio of underage to the sum of underage and overage costs. In our case, this critical fractile is roughly equal to 93% (=1.40/[1.40+0.10]). The critical fractile is the optimal type I service level. In other words, considering the underage and overage costs, you should strive for a 93 percent service level in bagel baking. This service level in the long run balances the opportunity costs of running out of stock with the obsolescence costs of baking bagels that do not sell. So how many bagels should you bake? The only difficult part now is to convert the service-level target (93 percent) into a so-called z-score. In Excel, this involves evaluating the standard normal inverse function (= NORM.S.INV) at the service level, which in our case produces a z-score of 1.50. One then multiplies this z-score with the standard deviation of our demand forecast to obtain the safety stock needed (1.50 × 80 =120). Adding this safety stock (120) to the point forecast (500) gives the necessary inventory quantity to obtain a 93 percent service level. In other words, you should bake 620 bagels to have a 93 percent chance of meeting all demands in a day.
This example illustrates the difference between a forecast, which serves as an input into a decision, and the actual decision, which is the number of bagels to bake. The point forecast is not the decision, and making a good decision would be impossible without understanding the uncertainty inherent in the point forecast. Good decision making under uncertainty requires actively understanding uncertainty and balancing risks; in the case of our bagel baker, the key managerial task was not to influence the forecast but rather to understand the cost factors involved with different risks in the decision and then define a service level that balances these risk factors. A proper forecast made in the form of a probability distribution or a prediction interval should make this task easier. The actual quantity of bagels baked is simply a function of the forecast and the service level.
Our previous discussion highlights the importance of setting adequate service levels for the items in question. Decision makers should understand how their organization derives these service levels. Since the optimal service level depends on the profit margin of the product, items with different profit margins require different service levels. While overage costs are comparatively easy to measure (cost of warehousing, depreciation, insurance, etc.; see Timme 2003), underage costs involve understanding customer behavior and are thus more challenging to quantify. What happens when a customer desires a product that is unavailable? In the best case, the customer finds and buys a substitute product, which may have been sold at a higher margin, or puts the item on backorder. In the worst case, the customer takes their business elsewhere and tweets about the bad service experience. Studies of mail order catalog businesses show that the indirect costs of a stockout—that is, the opportunity costs from lost cross-sales and reduced long-term sales of the customer—are almost twice as high as the lost revenue from the stockout itself (Anderson, Fitzsimons, and Simester 2006). Similar consequences of stockouts threaten profitability in supermarkets (Corsten and Gruen 2004). The task of setting service levels thus requires studying customer behavior and understanding the revenue risks associated with a stockout. Since this challenge can appear daunting, managers can sometimes react in a knee-jerk fashion and simply set very high service levels (=99.99%). Such an approach in turn leads to excessive inventory levels and corresponding inventory holding costs. Achieving the right balance between customer service and inventory holding ultimately requires a careful analysis and thorough understanding of the business.
Note that while we use inventory management as an example of decision making under uncertainty here, a similar rationale of differentiating between forecast and related decision making applies in other decision-making contexts as well. For example, in service staffing decisions, the forecast relates to customer demand in a certain time period and the decision is how much service capacity to put in place—too much capacity means money wasted in salaries, and too little capacity means wait times and possibly lost sales due to customers avoiding the service system due to inconvenient queues or inadequate service. Similarly, in project management, a forecast may involve predicting how long the project will take and finding how much time buffer is available to build into the schedule. Too large a buffer may result in wasted resources and lost revenue, whereas too little buffer may lead to projects exceeding their deadlines and resulting contractual fines. One has to understand that forecasting itself is not risk management—forecasting simply supports the decision that managers take by carefully balancing the risk inherent in their choice under uncertainty.
Actual inventory management systems are more complex than the examples in this chapter may suggest, since they require adjusting for fixed costs of ordering (i.e., shipping and container filling), uncertain supply lead times (e.g., by ordering from overseas), best by dates and obsolescence, optimization possibilities (like rebates or discounts on large orders), contractually agreed order quantities, as well as dependent demand items (i.e., scheduling production for one unit requires ordering the whole bill of materials). A thorough review of inventory management techniques is beyond the scope of this book, and interested readers are referred to Nahmias and Olsen (2015) for a thorough overview.
• Forecasts are not targets or budgets or plans; these are different concepts that need to be kept apart within organizations, or confusion will occur.
• Often the word “forecast” is shorthand for “point forecast.” However, point forecasts are almost never perfectly accurate. We need to measure and communicate the uncertainty associated with our forecast. To accomplish this, we calculate prediction intervals, which we can visualize using fan plots.
• A key concept required to convert a forecast into a decision is the service level, that is, the likelihood of meeting an uncertain demand with a fixed quantity. These service levels represent key managerial decisions and should balance the risk of not having enough units available (underage) with the risk of having too many units available (overage).
______________
1There are, however, exceptions to this observation. Retail organizations can have operations with more than 20,000 stock- keeping units (SKUs) that need to be forecast on a daily basis for hundreds or thousands of stores. Naturally, forecasting at this level tends to be a more automated task.
2The most likely state is the mode of a distribution. Many forecasting methods actually predict the mean of a distribution, which is different from the mode if the distribution is skewed.
3Symmetry of forecast errors is not always the case in practice, and assumed here only to make the argument simple. For example, items with low demand are naturally censored through the origin (= 0), creating a skewed error distribution; similarly if political influence within the organization creates an incentive to over- or underforecast, the distribution of errors can be skewed. Chapter 11 will examine how to detect such forecast bias in more depth.
4The mathematically more apt readers will notice that the y-axis of a probability distribution measures the density, not the probability of an event. While this distinction is theoretically an important one, we will refer to density as probability here, and this simplification is hopefully excused.
5Prediction intervals are often confused with confidence intervals (Soyer and Hogarth 2012). While a confidence interval represents the uncertainty about an estimate of a parameter of a distribution (i.e., the mean of the above distribution), a prediction interval represents the uncertainty about a draw from that distribution (i.e., demand taking certain values in the above distribution). The confidence interval for the mean is much smaller (depending on how many observations we used to estimate the mean) than the prediction interval characterizing the distribution.
6Prediction intervals like this are generally too narrow (Chatfield 2001). The reason is that these intervals often do not include the uncertainty of choosing the correct model and the uncertainty of the environment changing.
7It is important not to confuse type I and type II service levels. Type II service levels (also called fill rates) measure the likelihood of a customer getting the product, which is higher than the likelihood of meeting all demand with the available inventory.
18.119.253.168