What is the Actual Performance of SAP HANA?
Executive Summary
- Hasso Plattner, Steve Lucas, Ron Enslin, Vishal Sikka, and John Appleby provide oodles of false information around HANA’s performance.
- We cover the impact of SAP marketing on benchmarking on HANA performance.
Video Introduction: What is the Actual Performance of SAP HANA?
Text Introduction (Skip if You Watched the Video)
I covered the topic of HANA’s actual performance versus competitive databases in the article, Which is Faster, HANA, or Oracle 12C. SAP has spared no expense to lie about HANA to SAP customers and prospects. SAP has used special hardware to rig their benchmark in favor of HANA versus other databases and hide benchmarks where HANA performs poorly. This has combined with massive lying on Hasso Plattner, Vishal Sikka, John Appleby, Ron Enslin, Steve Lucas, and many more. You will learn about the various database benchmarks on HANA and its competitors in more detail.
Our References for This Article
If you want to see our references for this article and other related Brightwork articles, see this link.
Notice of Lack of Financial Bias: We have no financial ties to SAP or any other entity mentioned in this article.
- This is published by a research entity, not some lowbrow entity that is part of the SAP ecosystem.
- Second, no one paid for this article to be written, and it is not pretending to inform you while being rigged to sell you software or consulting services. Unlike nearly every other article you will find from Google on this topic, it has had no input from any company's marketing or sales department. As you are reading this article, consider how rare this is. The vast majority of information on the Internet on SAP is provided by SAP, which is filled with false claims and sleazy consulting companies and SAP consultants who will tell any lie for personal benefit. Furthermore, SAP pays off all IT analysts -- who have the same concern for accuracy as SAP. Not one of these entities will disclose their pro-SAP financial bias to their readers.
Who Performs Benchmark Testing in Databases?
The first thing to establish is that there is no independent body – such as Consumer Reports for database benchmarking. This means that vendors performed the benchmarks that I reviewed.
This is a major issue.
Let us enumerate the problems with having no independent source for benchmarking as it relates to databases.
Selective Release
A vendor would never release a benchmark, which showed it as losing to a competing vendor across the board. The reference would have to be positive for the vendor in some dimensions and more positive than negative for the released results.
This brings up the issue that pharmaceutical companies’ drug testing shows that negative studies tend to go unpublished.
“…studies about antidepressants made the drugs appear to work much better than they did. Of 74 antidepressant studies registered with the FDA, 37 studies that showed positive results ended up being published. By contrast, studies that showed iffy or negative results mostly ended up going unpublished or had their data distorted to appear positive, Turner found. The missing or skewed studies helped create the impression that 94 percent of antidepressant trials had produced positive results, according to Turner’s analysis, published in the New England Journal of Medicine. In reality, all the studies together showed 51-percent positive results.”
For instance, a past analysis of clinical trials supporting new drugs approved by the FDA showed that just 43 percent of more than 900 trials on 90 new drugs ended up being published. In other words, about 60 percent of the related studies remained unpublished even five years after the FDA had approved the drugs for market. That meant physicians were prescribing the drugs and patients were taking them without full knowledge of how well the treatments worked.” – LifeScience
We will address this topic directly as it appears that SAP is doing the same thing with its OLTP benchmarks for HANA.
Skill Familiarity Bias
A vendor will always have more skills in their solutions than in a competitor solution. Because databases can be “tuned up” and because of differences in hardware that is selected and some other variations, even if a vendor were 100% above board, they would still tend to observe better performance in their solution than a competing solution.
Hardware Bias
The vendors spare no expense in hardware for these tests. The customers will often purchase equipment that is lower in its specification than that used by the vendor. We cover this topic in the article How Much is Hardware Responsible for HANA Benefits?
The Laboratory Environment Bias
The hardware and database are run in a “lab” environment. It has no other batch jobs pulling its resources – which are, of course, unrealistic. Therefore the performance of the benchmark would typically not be attainable in a production setting. I see the benchmark results are more comparable between different benchmarks than between the baseline and a production environment.
Sales Bias
Every benchmark paper I looked at had one clear purpose. That was to improve the sales of the product benchmarked by the vendor that wrote the paper.
Interpretational Bias
The benchmarks that are released are then viewed through the prism of bias. That is people that have an incentive to prefer a particular software vendor. One entity that has published inaccurate information about benchmarks that have been right in line with their financial bias has been the consulting firm Bluefin, which is one of the least reliable providers of information on HANA.
The Benchmark Tests
The following benchmarks were reviewed that were performed for these databases.
- SAP OLTP Benchmark: This is a benchmark for transaction processing. Things that ERP systems tend to do the most are recording journal entries, decrementing inventory when performing a goods issue, etc..
- SAP BW-EML (Business Warehouse Mixed Workload) Benchmark: This is an analytics benchmark. We cover this topic in the article How to Understand the Issues with BW EML Benchmarks. And we cover the SD benchmarks in the article How to Understand the Issues with SD HANA Benchmarks.
SAP’s Missing Benchmarks
For years SAP would release an OLTP benchmark for databases. However, with HANA, SAP stopped publishing this benchmark. Database design would predict that HANA would perform poorly in this benchmark, which is the most likely reason why SAP never produced this benchmark. However, the consulting firm Bluefin Solutions has the following way of covering this up:
“The SAP HANA platform was designed to be a data platform on which to build the business applications of the future. One of the interesting impacts of this is that the benchmarks of the past (e.g. Sales/Distribution) were not the right metric by which to measure SAP HANA.” – Behind the SAP BW EML Benchmark
This is reinforced by the quotation from the book SAP Nation 2.0.
“It has not helped matters that SAP has been opaque about HANA benchmarks. For two decades, its SD benchmark, which measures SAP customer order lines processed in its Sales and Distribution SD module, has been the gold standard for measuring new hardware and software infrastructure. It has not released those metrics using a HANA database. One of the (unsatisfactory) excuses offered is that the expensive hardware needed to support such a test in a lab is better shipped to paying customers.”
John Appleby’s Lack of Transparency on Financial Bias
At no point in this article by John Appleby does he declare that he has a quota or leads a group with a quota to sell HANA. Further than this, in the 2013 timeframe, John Appleby was preparing the company for which he was an executive to be sold to Mindtree, which we cover in the article Appleby’s False HANA Statements and the Mindtree Acquisition. Curiously, Appleby’s aggressive promotion of HANA considerably declined after Bluefin Solutions was sold to Mindtree.
John Appleby presents himself as if he is a disinterested third party. So that is problem number one. But the second issue is that Appleby is speaking what amounts to gibberish in this quotation.
- S4 has a Sales module.
- This sales module will be performing the same functions as the current ECC SD module. Will there be analytics involved in the Sales module? Of course. However, there will also be transactions or OLTP performed.
- S4 Sales will record sales orders, update sales orders, etc.…
- Therefore it is demonstrably untrue that an OLTP benchmark is now irrelevant because “the platform was designed to be a data platform on which to build business applications of the future.” That sentence is false, and it’s hard to twist oneself up into a pretzel to try to defend it. The person seems to be preparing to run for political office.
Appleby’s interpretation of the BW-EML benchmark contains other nonsense like
“the configuration used by published results is the stock installation…there are not performance constructs like additional indexes or aggregates in use.”
Appleby’s Trademark Nonsense
The reason this is nonsensical is that column-oriented databases don’t use indexes. They don’t need them. Why Appleby is impressed by this is a head-scratcher. How many times has it been established that the primary reason for reducing the size of the database footprint is the removal of indexes? If so, and if this is widely accepted, why is it surprising to Appleby that the BW-EML benchmark for a column-oriented database does not have indexes???
On the topic of aggregates, HANA does use aggregates but does not call them aggregates. So what Appleby is saying is incorrect, although there are fewer aggregates.
It is important to remember that John Appleby is tied to the hip with SAP. Like other SAP consulting companies, John Appleby and his company Bluefin Solutions primarily repeat whatever SAP says. For whatever reason, Bluefin Solutions, and John Appleby, in particular, were chosen/decided to release information about HANA and S/4HANA that would have had to have been approved by SAP. Please see our analysis of the consulting partnership agreement that controls SAP consulting partners’ media output in this article.
John Appleby wins another Golden Pinocchio Award for his statements regarding the BW-EML benchmark.
Hasso Plattner’s Overstatement of the Importance of Removing Aggregates
Hasso Plattner has had an obsession with eliminating aggregates for some time, and he rails against aggregates in his articles and his books, but in many cases, aggregates are beneficial. Unlike what Hasso Plattner states, not everything needs to be continuously recalculated. And not everything needs to be recalculated every time it is accessed.
Many reference tables only occasionally change. Why instantaneously recalculate something that rarely changes? This is just a waste of processing cycles.
Let us take an example.
We want to see a report of all the sales orders that a company has processed for the past three months. This report was processed and aggregated, along with different dimensional attributes yesterday. Under Hasso Plattner’s logic, this aggregate is worthless because it is pre-calculated.
Let us look at that statement in detail.
Example
Let us say that the aggregate was calculated yesterday, precisely 24 hours prior.
- One day is roughly 1/90th of three months.
- If we look back 90 days in the report, we will show 100,000 sales orders. That is an average of 1111 sales orders per day (yes, weekends would be less than workdays, but as an average, 1111 sales orders)
- Now let us say that the day that drops off if we run the report anew had 1500 sales orders created (so a high day). And let us say that the day added, which is yesterday plus the hours up until the present hour, is 700 sales orders (a low day).
- So instead of looking at 100,000 sales orders, we are now looking at 99,200 sales orders. Is that a real problem? Is the last 24 hours more representative than the 24 hours from 3 months ago? Probably not. But if it is, how much more should the company be willing to spend to get rid of all aggregates? And are there other investments that might be a better use of that money?
SAP’s Overestimation the Importance of HANA’s Changes as it Relates to SAP HANA Performance
There is an unlimited number of scenarios that could be imagined to determine the importance of removing aggregates. For instance, if just two days of sales orders were reviewed, the company would receive a much more significant variation. However, the needs for instantly recalculated information are significantly overestimated in vendor marketing documentation and analytics vendor documentation in particular.
The articles Monthly Versus Weekly Forecasting Buckets and Quarterly Versus Monthly Forecasting Buckets challenge a long-held belief that forecasting information must frequently be updated with the most recent sales history to obtain the highest forecast accuracy. This was testing with actual client data and from a client with a challenging forecast sales history. It will show that, as with the tests I performed at previous customers, this is not important and contributes little to predicting accuracy.
This client was convinced that daily forecasting produced the highest accuracy.
Generalizing From Misrepresented Scenarios
So, while there can be scenarios where getting the most up-to-date information is critical, SAP tends to take these few scenarios and generalize them to be “normal,” when in fact, they tend to be the exceptions. Hasso Plattner has a way of presenting things that are often quite gray as black or white. And of course, all of Hasso Plattner’s examples have the peculiar and consistent outcome of handing over more money to SAP. I don’t make more money if I can exaggerate the way that Hasso Plattner does, and therefore his proposals tend to come off as sales fluff…at least to me. After years of reading Hasso Plattner’s statements, I do not consider him credible or truthful.
The Logic for The Improved Analytical (SAP HANA Performance) of Column-Oriented Databases
I found this quotation from IDC to explain my column-oriented databases are so efficient for analytics.
“The established approach to setting up a query/reporting database (ODS, data mart, data warehouse) has involved establishing indexes for all columns that might have value lookup operations in the queries. Many organizations now use columnar databases, which have the same relational characteristics as row-oriented databases but store the data in blocks of column rather than row data for speed of retrieval. This obviates the need for indexes and, in some cases, for cubes and materialized views.” – IDC
“If live data is to be queried and updated at the same time, the queries must be very fast in order to avoid consuming resources on the database server and slowing down transactions. A number of vendors have created database technologies that optimize query performance by combining two key elements: query-optimized columnar organization for the data and memory-optimized database operations. In the case discussed here, however, there is an additional challenge, which is to maintain that data in a form that also supports a high-performance transactional database.” – IDC
“Database In-Memory leverages a unique “dual-format” architecture that enables tables to be in memory simultaneously in a traditional row format and a new in-memory column format. The Oracle SQL Optimizer automatically routes analytic queries to the column format and OLTP queries to the row format, transparently delivering best-of-both-worlds performance. Oracle Database 12c automatically maintains full transactional consistency between the row and the column formats, just as it maintains consistency between tables and indexes today.
· Access only the columns that are needed.
· Scan and filter data in a compressed format.
· Prune out any unnecessary data within each column.
· Use SIMD to apply filter predicates.” – Oracle
Interpretation
However, this does not mean and Oracle is not implying that a column-oriented database is better for applications outside of analytics. And as far as I can determine from reading the perspective of different database vendors on this topic, SAP is the only database vendor that proposes that a column-oriented design is better for all types of applications.
Some of the Results On Oracle and SAP HANA Performance
For instance, in the Oracle benchmark paper released in 2015, the benchmark was tested on hardware similar to what SAP used in its BW-EML benchmark but left out the topic of how many customers would use this hardware configuration. I don’t know myself as I have not recorded many clients’ hardware specifications, but the hardware employed by Oracle appeared quite advanced. At one point, SAP’s benchmark used a machine with 1536 GB of RAM.
I have personally never heard of this much RAM being used on a server at any account that I have worked on. It probably exists as there are very advanced companies out there doing scientific computing. But it is a small number.
At one point, Oracle points out that the monster machine used by SAP beat Oracle’s BW-EML benchmark but needed three times the amount of memory to do this. Things bring up the question of whether SAP’s hardware was simply re-engineered to beat the Oracle benchmark. Did SAP first try the machine with 1000 GB of RAM, add 200 GB of RAM, then test again, and then add another 200 and test again, etc., until it finally beat the Oracle score?
In another benchmark, SAP installed 100 IBM servers in an SAP HANA cluster. Furthermore, if no one outside of the NSA, Amazon AWS (which resells portions of its hardware over the cloud) or a scientific computing center, will be willing to buy this size of hardware, how relevant are these benchmarks majority of HANA customers?
The Impact of Marketing on SAP Benchmarking on SAP HANA Performance
SAP needs to get marketing out of the process of releasing benchmarking information. The marketing influence is apparent in the benchmark publication SAP HANA Performance: Efficient Speed and Scale-Out for Real-Time Business Intelligence. One should not need to see a cover plastered with stock photograph imagery of a man pulling a “fly” snowboarding maneuver. Then, an image of many people rowing together, along with a marketing written introduction that uses a word salad of terms like NetWeaver components. How is this related to SAP HANA performance? This should be a scientific paper that is not word-smithed and couched in deceptive marketing language.
SAP marketing must acknowledge that not every paper produced by SAP needs to have its fingerprints on it. Here is an example of the type of nonsensical writing that I am referring to.
“The drill-down queries (276 to 483 milliseconds) demonstrate SAP HANA’s aggressive support for ad hoc joins and, therefore, to provide unrestricted ability for users to “slice and dice” with- out having to first involve the technical staff to provide indexes to support it (as would be the case with a conventional database).”
Please do not use the term “slice and dice” in a technical paper, or the word “unrestricted,” or the colorful “HANA’s aggressive support.” This is not scientific terminology. SAP’s benchmarking paper needs to be rewritten entirely, just using the original data. Then at the end, SAP has quotations like the following:
“We have seen massive system speed improvements and increased ability to analyze the most detailed levels of customers and products.” – Colgate Palmolive
This is an anecdote, and it sounds like Donald Trump wrote it (except it uses the word massive instead of tremendous.) If Donald Trump could have written your benchmark study, you have a credibility problem with your benchmarking study.
The Specifics of Database Performance
We have been saying that HANA’s only beneficial area of performance is for analytics, which is called a read operation in database speak.
However, there is a level below this in detail. HANA’s primary beneficial area is for short SQL queries. An excellent example of a short SQL query would be a query for BW.
Long Versus Short SQL Queries
HANA’s performance degrades for longer queries.
An excellent example of a longer query is within ECC or S/4HANA. This is where the data is less prepared.
However, in SAP’s marketing material, they propose that HANA is excellent for reporting ERP systems. There is no evidence of this up to this point. The evidence points in the opposite direction quite strongly, as we cover in the article Why HANA is a Mismatch for S/4HANA and ERP.
Ironically, we have had many people tell us that once reports can be run from the ERP system, there will be no reason to have a centralized BI system. But the performance of HANA does not support this vision.
What Happened to SAP’s Row Store Performance?
The following quotation can be found in Oracle’s “Analysis of HANA HA” document.
“The SAP HANA database consists of two database engines:
The column-based store, storing relational data in columns, optimized for holding data mart tables with large amounts of data, which are aggregated and used in analytical operations.
The row-based store, storing relational data in rows.
This row store is optimized for write operations and has a lower compression rate, and its query performance is much lower compared to the column-based store”
Not all row-based stores are created equal, as HANA’s ECC performance is worse for transactions than ECC on Oracle or IBM’s pre-column store databases. This explained the performance differences between Oracle DB, DB2, SQL server, MaxDB, and Sybase ASE even though all are row based by default.
One thing to remember is that HANA is still a relatively new database. When discussing this with a very experience database resource, they pointed out the following observation.
“That’s is no way a brand new row based DB can beat all these databases above which are optimised over so many years especially Oracle DB.”
Conclusion
When one compares Bill McDermott, Hasso Plattner, SAP marketing, Bluefin Solutions, Deloitte, and others say about HANA’s game-changing aspects to the technical benchmarks. There is no correspondence.
SAP invests comparatively little in benchmarking, but its marketing spending on HANA is enormous. This is similar to the major pharmaceutical companies. Pharmaceutical companies spend far more on marketing than research. The study is mostly just running clinical trials based on research performed by universities and is publicly funded.
Evidence of Oracle Outperformance SAP HANA Performance
Oracle has provided compelling evidence that its 12c database outperforms SAP HANA. I say this while acknowledging the fact that there is no independent body that performs database benchmarking. Oracle invests much more in database benchmarking, and its benchmarking studies are more transparent and make the case far better than SAP’s.
For all of the talk of SAP HANA performance, SAP produces a single benchmark to support these supposed claims of superiority over Oracle 12c and others. While we do not have independent verification, sifting through the results, it seems more likely than not that Oracle 12c is not only a little bit but far faster than HANA. And secondly, while SAP has placed speed as the priority in its database design, Oracle’s orientation is far more holistic, putting reliability first. Secondly, given 12c’s design, it will almost certainly easily beat SAP HANA performance for OLTP processing.
How Many Databases Can Outperform HANA?
This article did not review the benchmarking of other database vendors. However, I find it more likely than not that vendors like IBM, given the database talent that they have, do not also have a solution that is superior to SAP HANA performance. And the list of other database vendors that can also beat HANA is likely more than just Oracle and IBM. The bottom line is that out of one type of database processing, called short query SQL, SAP HANA Performance is poor.