Comments on Brightwork Articles on Outliers in Forecasting
Executive Summary
- This article contains comments from articles on outliers in forecasting.
Introduction
These comments are in response to the articles on outliers in forecasting.
Comment #1: Tim Reilly
Shaun,
Great Article.
I just want to second your point about finding an application which is good at doing this. For example, your equation is the classic regression equation (ie y=a +bx). Most software will use that to do causal modeling. The problem is that regression assumes that the first and last observations have equal importance. Regression ignores time. In time series analysis, this is called “autocorrelation”. Regression is meant for cross-sectional analysis and not time series. You need to use transfer function modeling approach where you weight the historical observations to reflect changes in the relationship over time. The relationship could be between the causal variable and sales and just the history of sales itself (ie seasonality, etc.).
Another complicating factor are the lead and lag relationships between the causal and sales. You need to not just consider the contemporaneous relationship, but also the lead/lags as people don’t buy beverages on new year’s eve but the days leading up to it.
The implications of not adjusting for outliers has been well documented in many Statistical Journals. I will point you to the great work of Ruey Tsay here https://www.unc.edu/~jbhill/tsay.pdf
Your discussion of financial data and Nutrasweet is understood, but when it comes to supply chain, adjusting for outliers is very critical. And it is equally important how you identify them!As you point out, most systems using a simple approach of calling an outlier when it is 2/3 standard deviations outside and then asking you how many iterations of removing and adjusting that you should perform. This approach is very simple and misses other important outliers that distort the model and forecast. You need to identify the outliers while you are building the model AND a final check of 2/3 std deviations at the end of the process. A fun example, we like to torture our competition with is the series 1,9,1,9,1,9,1,5. Where is the outlier? Well we can see that the 5 is unusual and we could call this an inlier as it is “too good to be true” and at the mean. Simple outlier schemes completely miss this outlier and the forecast suffers. The 1,9 example is contrived, but is an example that does happen in datasets we see all the time.
Would You Like to Comment and Have it Added to This Thread?
Just provide your comment in the chatbox in the lower left of this screen.