Probabilistic Forecasting vs Statistical Forecasting

30 October 2017 - Stefan de Kok

Traditional forecasting approaches are becoming obsolete.

As supply chains are globalizing and product portfolios are growing, demand patterns are becoming lumpier and more intermittent. At the same time more companies are starting to accept that a monthly forecast by item is meaningless when purchases, production, and shipments need to be planned weekly by item, and by location. Even smooth, steady demand patterns by item/month become intermittent by item/week/location. Traditional demand forecast approaches, already failing to achieve adequate levels of accuracy, are deteriorating fast under these conditions. Even as forecasting systems are squeezing out extra slivers of accuracy, companies are witnessing consistent overall decline in achieved forecast accuracies. This trend is certain to continue.

But it gets worse.

Forecasts do not exist in isolation. They are created for a purpose, usually multiple purposes. For many companies, one of these purposes is to determine appropriate stocking levels as a safety to buffer against demand variability. The way the forecasts are determined however, does not allow a proper assessment of the shape and size of the uncertainty around the forecast. This uncertainty is the only accurate input to determine inventory buffers. Companies using traditional forecasting approaches need to resort to using demand variability or forecast error as input to safety stock formulas. The direct result of that is simultaneous out-of-stocks and excess inventory levels, excessive expediting and still not achieving service levels that those safety stocks promise to achieve. See for example in "Why You Keep Missing Your Service Level Targets" the gravity of the damage that occurs when you incorrectly assume a normal distribution of demand residuals, which is only one of the multiple problems of the above.

What is the alternative?

There are many alternatives. First and foremost, one should try to reduce any dependency on forecasts to the absolute minimum, because no matter what, forecasts will contain some amount of error. Wherever data and approaches are available that are more accurate, these should be preferred over a forecast for the same purpose. A number of so-called demand-driven approaches can be applied here, depending on data availability, budget and organizational readiness. In this sphere I consider DDMRP, demand sensing, and Point-of-Sale (POS) demand sensing.

However, one must realize that forecasts are a necessary evil. Demand-driven approaches can only take you so far without relying on a forecast. The reliance generally comes in two flavors:

  1. these approaches still require some forecast to function efficiently, for example to determine buffer levels, and:
  2. a hand-off needs to occur between time-horizons or functional areas where demand-driven can be applied and those where forecasts are the only option.

Traditional forecasting approaches are terribly equipped for either of these requirements. This is where probabilistic forecasting can save the day. Not only will it play nice with demand-driven approaches of any kind, it obliterates all the theoretical maximum levels of accuracy of the traditional forecasting approaches. This article explains the differences.

Difference in approach: distribution fitting versus curve fitting

Statistical forecasting fits a curve through a time series of historical demand data. The continuation of this curve into the future becomes the forecast. Prior to this, special effects are removed from historical data to form a "baseline", and then in future these effects are re-applied to the baseline. These effects could be trends, seasonality, promotions, and so forth. 

The benefits of this approach are twofold. First, it is intuitive to understand. If you believe that what happened in the past will happen again in the future, this is a logical approach. Second, it is easy to calculate. Anyone with above average Microsoft Excel skills can build this in a spreadsheet. It is easy to understand its appeal.

The downsides however are devious. The signs on the wall that the approach is fundamentally flawed are clear to anyone willing to look. The first sign is the dreaded over-fitting. It is theoretically possible to find a curve that fits the historical data perfectly, but it will have little to no predictive value. Ingenious "criteria" have been developed to assess the level of over-fit. If a curve exceeds some level of these criteria it is deemed to excessively over-fit and is discarded. This brings us to the second sign on the wall: multiple competing algorithms to determine the same forecast. Each of the curves generated by these multiple algorithms are compared. The "best" one is picked based on some arbitrary trade-off between degree of over-fitting and accuracy against the historical demand. The entire approach has no scientific merit whatsoever. This is explored in more detail in "Safety Stock and the Hazard of the Fitted Forecast Error".

A key take-away here is that statistical forecasting is purposely targeting a less than perfect fit. The immediate consequence is that it can never achieve adequate levels of forecast accuracy. Any approach that does not require this has a competitive edge.

One such alternative is probabilistic forecasting. It fits distributions instead of curves. This is easy to imagine WHEN the demand pattern is stationary, meaning that every time period has the same distribution of possible values.  In such a case you could create a histogram, graphing demand quantity along the x axis and the count of how many times such quantity occurred on the y axis. This histogram depicts the density of the empirical distribution of the demand data. A probabilistic forecast would fit a distribution function to this empirical distribution. Since it is impossible to over-fit, it can target the very best fit, not some arbitrary, less accurate fit. See for example "Why You Keep Missing Your Service Level Targets" where this is illustrated in more detail.

Side note: the size of the positive tail of the distribution will typically be underestimated if insufficient historical demand is available. It will still be considerably better than any assumed distribution of a statistical forecast, but in interest of complete openness I mention this can occur. Topic for a future article.

The tricky part is how to deal with non-stationarity. This means demand distributions are different in different time periods. Statistical forecasting approaches apply this concept to the average expected demand value in each time period. Probabilistic forecasting approaches may also differentiate how the variability changes in time (so-called heteroscedasticity, which you may instantly forget). You cannot create a histogram of one distribution across many time periods and expect good results if the periods are all expected to have different distributions. The answer is to decompose the demand into parts and reduce the baseline to a stationary series.

At face value this sounds exactly like what statistical forecasting approaches do: decompose a demand pattern into a level, trend, seasonality, various causal effects, and so forth. In principle, that is indeed what it is. There are a number of important differences though. Most importantly the probabilistic decomposition is not applied to averages, but to the entire distribution of values. Rather than state July sales have a 50% seasonal uplift compared to the annual average value it states what the distribution of uplifts are compared to the distribution of baseline demand values. It does this for every kind of impact to the demand. Another key difference is that probabilistic forecasting determines the impact of any such effects purely backward looking, where statistical forecasting approaches look at the entire historical time horizon. This means that every historical time period provides another opportunity to measure true accuracy (not merely a fitted error) that can be used as a tracking signal to dynamically adjust the forecast to rapidly respond as demand patterns change. Again, this removes any possibility of over-fitting. This allows probabilistic forecasts to target the very best fit without risk that it is spurious.

Difference in output: probabilities versus exact numbers

The two approaches are not merely different in how they are calculated. They also produce a different kind of output. The output is what is used by other planning processes and transactional systems. In short, statistical forecasts are expressed as time series of exact demand quantities, whilst probabilistic forecasts are expressed as time series of probability distributions. What does that mean?

To illustrate, consider repeatedly rolling two dice, as shown in the figure on the right, and being asked to predict the total number of dots of each roll. The equivalent of a statistical forecast is to predict that the answer is 7 dots each time, since that is the average you can expect. Basically, a stationary series of 7's. If we then roll the dice, any difference from 7 is a forecast error and, with enough rolls, the average absolute error will approach 70/36 or 1.944. However, you could instead have predicted that “7 would only occur with a probability of 1/6, and similarly 2 with probability 1/36, 3 with probability 1/18, and so forth for all possible outcomes.” This alternative prediction is a probabilistic forecast. 

The “error” — the difference between the observed actual and the unbiased point forecast of 7 — is not really an error at all. It is fully predictable variability. This predictability is something we can and should plan for, just as we would do when playing games of chance. The probabilistic forecast allows decision makers to do just that. These forecasts are expressed in terms of "uncertainty" as opposed to "error". For an explanation on the difference, see for example "Variability, Volatility, Uncertainty, and Error".

When you are looking at high levels of aggregation or possibly some of the fastest moving items, this difference in output is not very significant. An average value with an estimate of residual error or variability will get you close to accurate. But at greater detail or for any but your fastest moving items, this output of statistical forecast becomes useless. Imagine an intermittent demand pattern, where you get one order roughly every 4th week for one of your slower moving items in a smaller shipping location. The statistical forecast will state you will sell the equivalent of 0.25 orders every week. Three weeks you will have an error of 100%, and one week you will have an error of 300%. No matter what alternative value you would have picked instead of 0.25, you will be wrong more often than right.

Side note: you could possibly have used a variation of Croston's method, which tries to time when the demand will occur, but that will only really help in the unlikely trivial case where the frequency is absolutely constant. Where other methods make the error of assuming the quantity is deterministic (exact and known), Croston additionally assumes the timing is deterministic. 

Unfortunately, most companies have upwards of 90% of their items behaving intermittently at the granularity that matters, and the trend is that is getting worse. Hence, for the vast majority of items a statistical forecast is not adding any value. Generally, through unwarranted adherence to such forecast, value is destroyed.

Probabilistic forecasts on the other hand, are especially suited for intermittent demand patterns. Rather than state some average level, which will never occur as an actual, they state the probability of occurrence for each demand level. One way of looking at this (one of many) is future demand trends. Have you ever wondered if current trends will continue, get worse or better? The answer is not a single trend line. It can however be very elegantly expressed in terms of a projection of probabilities.

At some point a probabilistic forecast output needs to be shared with a deterministic system. At that point, richness of information and value are lost. To capture as much value as possible that hand-off needs to be deferred as much as possible. For example, if the probabilistic forecast is used by probabilistic inventory optimization and probabilistic planning and scheduling systems the value is fully utilized. Only when generating orders does it need to become deterministic. There are various ways to capture at least part of the extra value even if the forecast is the only probabilistic system used. More on that in future articles.

Difference in impact: accuracy vs precision

Statistical forecasts are precise by definition. They are provided as exact numbers of what will happen in future periods. They are however highly inaccurate. The exact numbers are seldom correct.

Probabilistic forecast on the other hand are not precise at all. They are expressed as vague ranges with probabilities that any value in the range may occur. However, if done well, probabilistic forecasts are highly accurate

If the distinction is not clear, I recommend the article "Are You Confusing Precision for Accuracy?", which explains the difference.

Accuracy without precision is perfectly fine. Granted, not as great as having both at the same time, but if one has to give, it must be precision. Accuracy is a metric that tells us how good the result is. Precision expresses a level of confidence in the results.

It should be clear that precision without accuracy is meaningless and useless. Worse, the false perception of accuracy will drive bad business decisions and bad planning decisions. It is false confidence in bad numbers. This is the easiest trap of all to fall into. Even many experts and vendors are unaware that this kind of error permeates all their plans and systems. Just think about all the planning processes you know of, and ask yourself the following two questions:

  1. How many plan calculations are based on exact numbers when those numbers are uncertain?
  2. How many plans create outputs that are more precise than their inputs?

An example of the first is using exact forecast numbers in planning. An example of the second is using monthly demand data to generate forecasts by week or day (usually by splitting demand values, sometimes by prorating demand). All of those make the costly mistake of valuing precision over accuracy.

Plans based on statistical forecasts will be ever-changing. There will be constant fire-fighting, where planners are tackling one emergency after another. Many times, expensive alternatives are required, such as expediting, that undo all of the value promised by a carefully made plan. Plans based on probabilistic forecasts however are stable. The most likely and the most impactful scenarios have already been accounted for. Decisions were made ahead of time, that if a certain improbable but high impact event were to occur, what would be the response, if any. And crucially, those decisions were made jointly by all stakeholders before everyone was running around like chickens without their heads, rather than one planner in the midst of the chaos.

The value of significant greater accuracy of probabilistic forecasts cannot be ignored. But the real impact is the stabilization of the supply chain from mayhem to full control.