A Better Way of Importing Data into Forecasting Systems
Executive Summary
- How is data generally placed into demand planning systems?
- Why are some of the popular current methods so problematic?
Definitions for This Article
- Attribute-Based Forecasting: The ability to group product by “attributes” such as Size, Flavor, Region flexibly, and relationally that supports forecasting. It creates a virtual rather than a static hierarchy.
- ROLAP: A specialized relational database designed for data retrieval.
- MOLAP: A specialized non-relational array database designed for data retrieval.
Introduction
Regarding the topic of demand planning, it is often overlooked, is how to get data into the forecasting system. Previously, and with current software that follows older data backend approaches, getting the data is extremely challenging. Currently, in most IT department across the US that supports demand planning projects or live systems, tedious and irrelevant discussions are taking place regarding data backend issues such as “characteristic value combinations.” Companies are spending money on costly hardware related to support demand planning systems all because of outdated software design. However, new technology and approach provide the capability of easily populating systems from flat files, simple SQL databases, or even Excel.
ROLAP and MOLAP
For more than fifteen years, data cubes in the form of ROLAP and MOLAP databases have been the primary method of creating relationships between various data elements in forecasting and in analytics that allow for fast reporting on element combinations.
“Here is where things get really exciting. Because the cube contains all of your data in an aggregated form, it seems to know the answers in advance. For example, if a user asks for total sales by year and city, those numbers are already available. If the user asks for total sales by quarter, category, zip code, and employee, those numbers and names are already available. If it helps you to understand them, think of cubes as specialized small databases that know the answers before you even ask the questions.That is the big advantage of a cube. You can ask any pertinent question and get an answer, usually at warp speed. For instance, the largest cube in the world is currently 1.4 terabytes and its average response time to any query is 1.2 seconds.” – Microsoft
Cubes were a response and improvement on relational databases for analytics, as relational databases are not optimized for reporting but for ensuring the integrity and consistency of transactions. When I worked at i2 Technologies, our demand planning applications used a MOLAP (Multi-Dimensional Online Analytical Processing) cube, which is to say it was a real cube.
Understanding Cubes and Star Schemas
MOLAP Cubes take time to build and are a hard-coded series of relationships that are built by the system, which allows for fast retrieval of relationships. When SAP developed their advanced planning forecasting product, they went a different route and used a relational emulation of a multi-dimensional cube called a star schema. (Strangely, and I think technically incorrectly star schemas are often called “cubes” when star schema or emulated cube seems a more accurate term. Also, interestingly, even real cubes are not even cubes because not all relationships are completed or necessary to be built.).
Both MOLAP and star schemas were data design adjustments that were made to overcome hardware limitations to meet the business requirement of analytics (forecasting being a sub-category of analytics). However, something which I either never or close to never hear discussed is that they also imposed high degrees of complexity on the implementation and the maintenance of the system (as implemented by many software vendors). Many MOLAP cubes and ROLAP / star schema cubes must be realigned when relationships are changed between any of the data elements, and this creates a disincentive to make changes to the relationships because it is a major hassle to do so.
Some companies reduce the amount of realignment inside of the actual forecasting database they have to perform by creating external realignment tables. I was recently at a client that had two realignment tables, one for historical realignment and one for realignment related to product master changes. Unsurprisingly, they are not happy with the maintenance of these tables because they are workarounds to try to create shortcuts to eliminate changing a maintenance item inside the forecasting database itself.
Setup, Implementation, and Maintenance
The data setup of many MOLAP and ROLAP / star schema systems is onerous, and they consume large quantities of maintenance resources for the life of the forecasting or analytics application. Forecasting and analytic technical conversations now have the quality of a Star Trek convention with the overuse of strange and convoluted terminology euphemistically referred to by one of my clients as “Klingon.”
This is serving to encapsulate forecasting and analytics in a tower of babble. However, for all the resources that are being poured into forecasting and analytics demand planning and analytics, not enough good things are coming out. In general, I am not observing forecasting improving much from the clients I visit, and a good part of this is related to the fact that the data backends that support these forecasting applications are too complicated to set up and maintain. There are certainly other reasons, and those are documented in other articles on this blog.
Working Smarter, not Harder
New technology now exists that makes realignments a thing of the past. This is described in this article. The major consulting companies are by in large, not bringing these new technologies to their clients because, in my experience, they are necessarily out of it when it comes to new technologies. Furthermore, the vendors that are the best in this approach, and the most significant vendors allow the large consulting firms to maximize their billing hours and time on projects. A major consulting company will never choose a superior technological solution when an inferior one means more revenue for them. Secondly, many of the major vendors in forecasting and analytics have built their software around the old and inefficient approach, and like things, the way they are.
Attributes: The New Approach
Demand Works uses a ROLAP / star schema. However, there is no extra data maintenance aside from the actual attribute maintenance itself. I can make any number of changes to Demand Works Smoothie’s attribute table or tab in a spreadsheet and upload it to the Smoothie database. How exactly this is done is a trade secret, but the fact is it works, and I have used it many times.
Demand Works represents its tables in simple flat files or spreadsheets that show each relationship attribute as a column in this spreadsheet. The spreadsheet is eventually imported into a SQL database that the Demand Works Smoothie application uses.
Not only is this more manageable to set up than that, say the SAP DP and BI Data Workbench. This is a highly inefficient data backend for forecasting analytics, which I have described previously as a Russian Matryoshka doll, where within every larger doll is contained a smaller identical doll. Attribute-based forecasting and analytics are easy to set up, surprisingly easy compared to the other approaches.
Conclusion
The convoluted and complicated methods of speeding the management of data relationships are causing terrible problems in both forecasting and analytics. They are holdovers from a time when hardware was significantly slower. This approach to forecasting projects has not worked very well, and companies are generally not advancing their statistical forecasting capabilities, partially because their forecasting solutions do not naturally enable improvement. However, they consume large amounts of IT resources.
The new approach is attribute-based data management and will yield enormous benefits for companies that move towards it. Any company that wants to understand it should get a copy of Demand Works Smoothie. It is the tool that I use for explaining attributes to my clients, and the response is always positive. People can appreciate a better way when they exposed to it. However, they often lack exposure to these new methods, and most of the training in OLAP in the US is in these dated approaches.
References
https://msdn.microsoft.com/en-us/library/aa140038(v=office.10).aspx