Count Data and Intermittent Demands
9.1 Definitions
So far, we have focused on using the normal distribution for forecasting. Using the normal distribution can be inappropriate for two reasons: First, the normal distribution is continuous, that is, a normally distributed random variable can take noninteger values, like 2.43. Second, the normal distribution is unbounded, that is, a normally distributed random variable can take negative as well as positive values. Both these properties of the normal distribution do not make sense for (most) demand time series. Demand is usually integer-valued, apart from products sold by weight or volume, and demand is usually zero or positive, but not negative, apart from returns.
These seem pretty obvious ways in which the normal distribution deviates from reality. So why do we nevertheless use this distribution? The answer is simple and pragmatic: because it works. On the one hand, using the normal distribution makes the statistical calculations that go on “under the hood” of your statistical software (optimizing smoothing parameters, estimating ARIMA coefficients, calculating prediction distributions) much easier. On the other hand, the actual difference between the forecasts—both point and distribution—under a normal or a more appropriate distribution is often small, especially for fast-moving products. However, this volume argument to support using the normal distribution does not hold any more for slow-moving demand time series.
Now, we can address both problems explained above by using count data distributions, which are random number distributions that only yield integer values. Common distributions to model demands are the Poisson and the negative binomial distribution (Syntetos et al. 2011). The Poisson distribution has a single parameter that is both its mean and variance. The more flexible negative binomial distribution has one parameter for the mean and another one for its variance.1
Figure 9.1 illustrates the difference between Poisson-distributed demands and normally distributed “demands,” at a common rate of 0.2 units per month. We see both problems discussed above (negative and noninteger demands) in the normally distributed data, whereas the Poisson time series does not exhibit such issues and therefore appears more realistic. Conversely, Figure 9.2 shows that there is little difference between a Poisson and a normal distribution for fast-moving products—here, at a rate of 20 units per month.
Figure 9.1 Random draws for count data and continuous data for low-volume products
Figure 9.2 Random draws for count data and continuous data for high-volume products
Once demand gets so slow that many time buckets in fact exhibit zero demand, we speak of intermittent demand series. We will use the terms “count data” and “intermittent demand” interchangeably. Finally, there is lumpy demand. Demand is “lumpy” if it is intermittent and nonzero demands are high. Lumpy demand can occur, for instance, for upstream supply chain members who have only a few customers that place batch orders, or in home improvement retail or wholesale stores, where house builders will typically buy large quantities of a particular paving stone or light switch at once. Low-volume intermittent data is often the result of many customers placing orders rarely; high-volume lumpy demand results from a few customers aggregating their orders into large batches. We illustrate the differences and similarities between these concepts in Figure 9.3.
Figure 9.3 Intermittent and lumpy demand series. Note vertical axes
Why is it important to forecast intermittent or lumpy demand? We naturally focus on forecasting the fastest moving products, simply because these products have the highest visibility in the firm (and market) and are often the most important ones in terms of margin and total revenue. However, most businesses also have a “Long Tail” of slow-moving products. By the anecdotal Pareto principle, 80 percent of your SKUs will be responsible for 20 percent of your sales. Classic A-B-C analysis in inventory management naturally differentiates between these fast- and slow-moving items. Many from these 80 percent of SKUs probably have intermittent demand series. While it is important to improve the forecasts for the 20 percent fast-movers that drive 80 percent of sales, the many more slow-movers may represent a much larger fraction of your total inventory value. Accurate forecasts can help you reduce these inventories, pool them, move to a make-to-order process, or improve your operations in other ways. There may actually be as large an improvement opportunity here as there is for faster moving items.
In addition, the occurrence of intermittent time series is often driven by several recent developments. A few years ago, database capacity and processing power limited the number of demand time series to be stored and forecasted on a weekly basis. Nowadays, vastly more powerful storage and processing (Januschowski et al. 2013) allow working with ever-lower granularity. And a time series that is fast-moving on a weekly basis may well be slow moving on a daily basis and can be heavily intermittent on an hourly basis. The more a firm disaggregates the time unit used for forecasting, the more likely the firm is to encounter intermittent data. Further, a small product portfolio which slices a market into only a few segments by providing a few different variants will likely produce high-volume series. However, with more and more product differentiation, many variants themselves may become intermittent in demand. Thus, due to increased product variety as well as data storage capacity available, we will see and need to forecast more and more intermittent time series in the future.
9.2 Traditional Forecasting Methods
Given the particularities of intermittent demand, how should we forecast series with such count data? Could we use the methods described in the previous three chapters in this context as well? We apply single exponential smoothing with a smoothing parameter of α = 0.10 (as per Chapter 6) to an intermittent demand series in Figure 9.4.
Figure 9.4 Single exponential smoothing applied to an intermittent time series
Remember that single exponential smoothing creates forecasts by calculating a weighted average between the most recent forecast and the most recent demand. Thus, the forecasts tend to drop and slowly move toward zero over time when no demand is observed. However, after demand is observed, forecasts briefly “jump” up again. Thus, our forecast is high right after a nonzero demand and low after a long string of zero demands.
This form of forecasting does not make sense in two important situations. First, consider a context where our intermittent demand is driven by a few customers that replenish this SKU when they need it; in this case, we would need a higher forecast (and not a lower forecast) after a long string of zero demands, because it becomes more likely that these customers will place an order again as more time passes. Second, if we have a context where the intermittent demand is driven by many customers buying independently, the forecast should not exhibit any time dynamics at all, because at any point in time, there is an equally large pool of customers that may soon demand the product. In addition, we will typically make replenishment or production decisions right after our stock has been depleted by a sale, so forecasts that are biased high right after sales will lead to particularly high reorder quantities and to unnecessarily high inventories. To overcome these problems, we will focus our attention now on a method that is designed to overcome these challenges.
9.3 Croston’s Method
Croston (1972) examines the problem of forecasting intermittent demand and proposed a specific solution to this problem. This solution is by now an industry standard and bears Croston’s name. Instead of exponentially smoothing the raw demands, we separately smooth two different time series: (1) all the nonzero demands from the original time series and (2) the number of periods with zero demands between each instance of nonzero demand. In a sense, we decompose the forecasting problem into the subproblems of predicting how frequently demand happens and predicting how high the demand is if it happens. We highlight the values associated with these two time series in Figure 9.5.
Figure 9.5 Demand and time periods without demand in an intermittent series
Thus, the problem becomes one of forecasting the series of nonzero demands, that is, {1, 1, 1, 2, 1, 1, 1, 1, 1}, and the series of time periods in-between nonzero demands, that is, {1, 0, 10, 2, 7, 5, 5, 10} in Figure 9.5. Croston’s method essentially applies exponential smoothing to both these series (usually with the same smoothing parameters). Further, we only update the two exponential smoothing forecasts for these two series whenever we observe a nonzero demand.
Let us assume that smoothing the nonzero demands yields a forecast of q, while smoothing the numbers of zero demand periods yields a forecast of r. This means that we forecast nonzero demands to be q, while we expect such a nonzero demand once every r periods on average. Then, the demand point forecast in each period is simply the ratio between these two forecasts, that is,
Croston’s method works through averages; while the method is not designed to predict when a particular demand spike occurs, it essentially distributes the volume of the predicted next demand spikes over the time periods expected until that demand spike occurs. One could also think of Croston’s method as predicting the ordering behavior of a single client with fixed ordering costs. According to the classic Economic Order Quantity model, a downstream supply chain partner will lump continuous demand into order batches in order to balance the fixed cost of order/shipping with the variable cost of holding inventory; Croston’s method could then be seen as a way of predicting the demand this downstream supply chain partner sees—that is, to remove the order variability amplification caused by batch ordering from the time series. Figure 9.6 provides an example of how Croston’s method forecasts an intermittent demand series.
Figure 9.6 Croston’s method applied to an intermittent demand time series
Croston’s method provides some temporal stability of forecasts but does not address the problem that, for a small number of customers, long periods of nonordering should indicate a higher likelihood of an order being placed. Further, Croston’s method works by averaging and forecasting the rate at which demand comes in, not by predicting when demand spikes occur; this makes it hard to interpret forecasts resulting from Croston’s method for decision making. The forecast may say that, on average over the next 5 weeks, we will sell one unit each week—where, in fact, there is likely only 1 week during which we sell five units.
As a matter of fact, a closer theoretical inspection shows that Croston’s method suffers from a statistical bias (Syntetos and Boylan 2001). Technically, this bias occurs because equation (24) involves taking expectations of random variables, and we approximate the expectation of a ratio between nonzero demands and interdemand intervals by the ratio of the expectations. Various correction factors to compensate for this problem have been proposed and implemented in modern forecasting software. One example of such a correction procedure is the Syntetos-Boylan approximation (Syntetos and Boylan 2005; Teunter and Sani 2009), which has been found to often lead to better inventory positions (Syntetos, Babai, and Gardner 2015).
Croston’s method is very simple and may appear to be “too simple.” Why is it nevertheless very often used in practice? One factor is of course its simplicity—it can be quickly explained, and the logic underlying the method is intuitive to understand. Another explanation may be that intermittent demand often does not exhibit a lot of dynamics that can reasonably be modeled: it is very hard to detect seasonality—trends or similar effects in intermittent demands—so it usually does not make sense to try to create a better model. In fact, competing models for intermittent demands have been proposed, but their added complexity needs to be weighed against any gain in accuracy compared to Croston’s method—and no other method has so far consistently outperformed Croston’s method with the Syntetos-Boylan approximation.
That said, there is one situation in intermittent demand forecasting where Croston’s method does not perform very well, namely, for lumpy demands. Suppose we have a demand of one unit once in every 10 weeks (nonlumpy demand), or of 10 units once in every 100 weeks (lumpy demand). Average demand is 1/10 = 10/100 = 0.1 unit per week in both cases. Croston’s method will yield forecasts of about this magnitude in both cases. However, the two cases have very different implications for inventory holding. This is really not a shortcoming of Croston’s method as such, because a point forecast of 0.1 is a completely correct summary of the long-run average demand. The problem lies in the fact that the point forecast does not consider the spread around the average. Unfortunately, there is no commonly accepted method for forecasting lumpy demands for inventory control purposes. Most forecasters use an ad hoc method like “stock up to the highest historical demand,” use a standard forecasting method with high safety stocks, or try to reduce their reliance on forecasting for such time series by managing demand and moving to a make-to-order system as much as possible. The challenges of intermittent demand forecasting are one reason why spare parts inventory systems (which generally have intermittent demand patterns) are more and more utilizing additive manufacturing to print spare parts on demand (D’Aveni 2015).
9.4 Key Takeaways
• Count data and intermittent demands are probably responsible for only 20 percent of your sales, but may account for 80 percent of your inventory costs. Therefore, it makes sense to invest into forecasting them well.
• Do not use exponential smoothing or ARIMA to forecast intermittent demands. Instead, use Croston’s method or other methods that are dedicated to intermittent series.
• If you can expand the time unit being analyzed or aggregate across locations, you can often convert intermittent time series into nonintermittent series that are easier to forecast.
• Lumpy demands are particularly hard to forecast, since the average rate may be useless for inventory control.
______________
1Technically, the extra parameter of the negative binomial distribution captures overdispersion, that is, the amount by which the variance of the distribution exceeds its mean.
18.118.184.91