How to Move to From Oracle to Open Source Databases with AWS and Google Cloud
Executive Summary
- AWS and Google Cloud are offering great opportunities to leverage open source databases.
- We cover how to accomplish the migration.
Introduction
Oracle representatives like discussing Oracle’s market share in the database market. They love talking about the different things the Oracle database can do and its many advanced features.
What they don’t like discussing is the costs for the Oracle database. That topic nearly always gets deemphasized when conversing on the subject of the Oracle database.
So how expensive is the Oracle database?
TCO by MariaDB into the Oracle Database
The open-source project MariaDB created a series of cost estimates for MariaDB versus Oracle and a TCO. Before we dive into it, let us explain Brightwork Research & Analysis’s background and views on TCO studies.
- Who is Performing the Study?: We also don’t think the vendor or a consulting company, for that matter, is the right entity to be performing TCO estimates. When an analyst firm like Forrester is hired to perform the TCO study, they will rig that study to make the entity paying them have the lowest TCO.
- Predictable Outcomes: The TCO studies that we have reviewed have nearly always been financially biased, with predictable outcomes (the entity performing the TCO almost always bestows the lowest TCO to themselves).
- The Issue of the Incomplete Nature of TCO Studies: TCO studies have a long history of being incomplete (the less complete the TCO study is, the lower the TCO, which makes vendors and consulting firms positively ecstatic).
The Issue with Vendor TCO Studies
We would never accept cost estimates from a vendor without analyzing them. And there is no doubt that MariaDB is pro MariaDB. And there is no doubt that the lead developer for MariaDB, Michael Widenius, hates Oracle. His lack of trust for Oracle is why he forked MariaDB from MySQL after Oracle acquired Sun Microsystems and, therefore, MariaDB in 2009. MariaDB is both an open-source project and a corporation that promotes MariaDB (there is both a MariaDB.com and a Maria DB.org – which is the foundation). While private entities have an atrocious record of performing research, the best research comes from non-profit entities. Our university system is primarily non-profit (and the for-profit universities tend not to produce research). MariaDB is not a non-profit. Its corporation is for-profit but private and has raised $98 million in capital. The corporation publishes the TCO study.
However, there are a few things we want to cover about the MariaDB costing paper.
- We have reviewed a considerable number of TCO estimates for our book on TCO. This MariaDB cost study is one of the best that we have ever reviewed.
- As MariaDB is an open-source database, we think this comparison, while limited in the paper to MariaDB, would apply similarly to other open-source database projects like MySQL and PostgreSQL, etc..
Now that we have addressed how suspicious we are of TCO studies, and discussed what MariaDB is, let us begin with the basic cost comparison table.
Notice how the add-ons from Oracle add up. Oracle sells its database in “pieces.” However, the eyes begin to pop when we review the next table.
This is what companies are paying to use Oracle? Amazing.
This is one of the most interesting graphics in the TCO study. This graphic shows the price drastically increases when the Oracle DB is run on AWS (under this scenario). Notice again that MariaDB stays low on AWS.
In its conclusion of the study, MariaDB goes on to state the following:
“After three years, running on three on-premise servers, each with two, 16-core processors:
The total cost of Oracle is 80x higher than MariaDB
The annual cost of Oracle is 30x higher than MariaDB
Organizations can save $9 million after three years by choosing MariaDB
Organizations can save $1.1 million annually by replacing Oracle.”
These types of numbers make one wonder. Companies are locked-in to Oracle in some cases, but in other cases, they aren’t, and they are still using the Oracle DB.
We will cover the effort in transitioning from Oracle, which is not minor, but it was important to place these costs at the beginning of this chapter to highlight how substantial the cost is to run Oracle. One cannot stop at “migration costs are expensive” without acknowledging the current giant running balance of staying on Oracle.
The Price of CockroachDB Versus Amazon Aurora
One might think that this type of comparison is only an issue for commercial databases like Oracle versus open source. However, the following is a comparison of CockroachDB versus Amazon Aurora. These are both designed from the ground up for the cloud. Amazon designed aurora to have up to 15 read replicas across different availability zones, and the backups are continuous and designed for 99.999999999 percent durability and backed up to S3. Aurora is a distributed database that divides the data into 10 GB segments, spread across the database cluster across availability zones. However, a customer can use the similar CockroachDB on AWS if they find that option more appealing. CockroachDB provides the following comparison between their database and Aurora.
CockroachDB thinks that it offers a better value than Aurora. CockroachDB touts its throughput in the following quotation.
“CockroachDB has over 99% of Aurora’s throughput, with only 10 to 20% of the cost. We think it’s safe to say that on the TPC-C 1k warehouse benchmark, CockroachDB is a much more attractive option.”
Claims are a dime a dozen in the software industry.
So is this claim true?
To determine the validity of this comparison, one needs to review the paper’s assumptions as the paper states that CockroachDB and Amazon RDS.
“use very different operation models.”
In the past, it was challenging to verify vendor claims regarding how they compared to other options. However, with the cloud, it is now quite straightforward to perform such tests.
In fact, in the paper, CockroachDB states the following:
“We don’t want you to simply take our word on our performance. We want you to try it out for yourself.”
That is not a phrase that is uttered very frequently by on-premises vendors like SAP or Oracle! We have never heard it pronounced. The concept of SAP and Oracle projects is to listen to your sales rep (or consulting partner).
Choosing Options
To figure out the best option, companies will need to move away from the on-premises model of buying by demos and projections into dynamic testing cloud services and cloud services in conjunction with one another. This means, of course, increasing the size of the portion of their IT staff that performs testing.
Adding testing capabilities can be achieved by moving some of the work currently being conducted to manage databases to fully managed databases at AWS or Google Cloud. That leaves some staff available to be reassigned.
In the long run, it means more time spent testing and validating and less time managing solutions that were purchased under inaccurate assumptions and that are not good fits for the requirements. This may sound great hypothetically; however, this is no simple thing, mainly if there is an embedded pattern of doing the opposite. Ill-fitting solutions produce a vast overhang in IT departments. They are everywhere. In line with the cloud service availability, we continually promote scientific testing at our clients. It is an uphill fight. This is because we frequently get feedback that it is the preference to stay away from testing. Testing “takes too long.” It is “too much effort.” That sentiment has to change to determine what services can be leveraged.
The Common Problems with the On-Premises Database Management Model
The current model of on-premises database management does a poor job of scaling and is also inefficient in that it duplicates labor in many locations. Under this “distributed model,” each company maintains many databases. This is because it relies upon many databases that must be maintained less capably than providers like AWS or Google Cloud, who can apply scale economies to database management. As just one example, the average database server in a data center in an on-premises environment has a 5 to 10% utilization. The ability to get that utilization far higher (on the IaaS’s provider’s server) is just one of many improvement inefficiency areas. That is before the application of higher levels of database specialization are taken into account.
Considering Oracle’s Database Assumptions
For many years now, with little competition, Oracle has dictated how the relational database will be, how much maintenance it will require, and how many people will be needed to be on staff to support it. When companies pay 22%+ to Oracle, this covers opening tickets with Oracle and having those tickets worked; this does not include the cost of the database administrators (DBAs) or other overhead. To even get the Oracle database to run correctly, quite a bit more DBA time and effort is necessary than in any widely used RDBMS. Oracle is widely considered to be one of the most challenging databases to get to run correctly. Therefore, the actual cost of maintaining Oracle DBs is the 22%+ added to the implementation, attached to the internal resources and DBAs that support the Oracle DBAs.
Oracle’s Strategy
Oracle’s sales strategy is to continue to add functionality to justify Oracle DB’s high price. However, the fact is that while some companies can leverage the upper-level functionality of the Oracle DB, the vast majority do not. Moreover, this is why most Oracle customers are still using one of 11 versions of the Oracle DB rather than 12 or 18 (Oracle changed its versioning and “jumped” from 12 to 18, so the follow-on to 12 is not 13.) Seeing the reduced value of Oracle’s support, many Oracle customers have dropped support and are unconcerned with the latest version of the Oracle database and either self-support or use a third-party support entity.
The observation of one of our authors, Mark Dalton, gathered from his company AutoDeploy that the primary motivation for customers to move to the cloud databases is to leave behind Oracle’s inflationary TCO. Mark has observed the following features of those companies that migrate to AWS.
- All of our customers’ migrations run on fully managed AWS RDS services (MariaDB, PostgreSQL, or MySQL). The ones who keep running 9i or 11i no longer pay Oracle for support. They’ve been running unsupported for years.
- They stop paying for features they don’t need and support from which they see little value.
- The performance is neither better nor worse than before.
- The cost savings are quite substantial, with a median saving of 80% of annual IT OPEX spend.
While all of this has been happening, open-source relational databases, notably PostgreSQL, have, for the most part, caught up with the Oracle DB. This would be more widely publicized, but the open-source projects don’t have the money to promote this type of information how SAP or Oracle, of course, do. This, as we have pointed out previously, is a flaw in the commercial software model. Money means the ability to promote solutions over open-source alternatives.
The Applicability of PostgreSQL
The truth is that the vast majority of applications, PostgreSQL, can substitute very well for Oracle, and while Oracle would like to propose that PostgreSQL is not as “enterprise-ready” as Oracle. The existence of so many PostgreSQL instances that handle such large volumes makes it difficult for Oracle to make this argument reasonably. New databases like Aurora, developed by AWS for the cloud (rather than being developed on-premises and then ported to the cloud), have been designed to leverage cloud hardware resources rather than on-premises hardware resources. This gives them an advantage against the Oracle database adapted to run on the cloud from its on-premises history.
DB-Engines ranks database popularity. Its method is not perfect, but it is more of an indicator. This is because the exact prevalence of each database is not knowable. DB-Engines track the rise of PostgreSQL. Of the top four databases, which are all of a similar design, PostgreSQL is the only one with year-over-year growth. The overall relational database market is not growing anywhere near PostgreSQL’s year-over-year 47%+ growth. Some of this growth has come at the expense of some of the databases above the list. MySQL (owned by Oracle, but open source), SQL Server, and PostgreSQL (also open source) are far lower in price than Oracle. The vast majority of applications can do everything that the Oracle DB can do. All of them are easier to use and easier to optimize. They are also as lower in maintenance than Oracle, DB2, and IBM databases. This goes after a similar market to Oracle is also declining in popularity but from a much lower base.
Administration Benefits of PostgreSQL and AWS
As with other databases ranging from MariaDB to Aurora to CockroachDB, PostgreSQL is far easier to administer than Oracle. AWS knows this quite well, and this is in part what lead AWS to introduce the RDS (which stands for Relational Database Service), but could just as easily be called Managed Relational Database Service) offering. That is the database software itself that is easier to administer. However, AWS adds to this by making RDS multitenant and then makes the entire service centrally managed.
The AWS RDS has the following characteristics:
- In RDS, the particular database is called the database engine.
- The DB instance is the building block of RDS, which AWS calls the cloud’s isolated database environment.
- There is a many-to-one relationship between the DB instances and the RDS license, which allows up to 40 DB instances per RDS license.
- The DB instance has a computational and memory capability as determined by the DB instance class, which falls into Standard DB, Memory Optimized DB, and Burstable Performance DB.
- The DB instance can create any number of customer-created databases.
Multitenancy, VPC, and the RDS
A significant advantage of RDS is that it is multitenant. Multitenancy functionality is quite advanced and allows many user’s data to be kept in the same RDS. This is how the RDS gains such high degrees of scale economies and how it reduces maintenance overhead. This is why the cloud is so much more than hosting or “private cloud.” It means creating large-scale economies from managing many customers on one instance of the software. In this case, a database.
RDS has default security. However, unless one uses a legacy database, the RDS service is protected with a VPC, or virtual private cloud is used to segment the customer’s DB instance from other DB instances, even though the customers share or are multitenant on the RDS.
- VPC as a Sub-cloud: A VPC is a Virtual Private Cloud, a sub-cloud, a private cloud within a public cloud. AWS and Google Cloud are public clouds.
- Mini Private Clouds: The VPC capability means that they can segment the public cloud into mini-private clouds.
- Data Isolation: While the RDS or Cloud SQL database instance is public, the VPC means the VPC isolates the customer data.
The VPC creates a virtual network within AWS for each customer. The metaphor of a “network” applies because it is as if it is a network in one’s own on-premises data center. In the graphic above, containers are shown within either a datacenter or a VPC. This implies that the two are logically the same.
In this more detailed graphic, notice again the VPC “contains” two different Availability Zones.
The VPC provides a high degree of control related to selecting IP address ranges, configuring routing, and other security features. As each DB instance has various computing resources allocated to it, with so many customers in the same VPC protected database, the administration overhead declines significantly.
The combination of lower overhead database software combined with multitenancy and central administration takes RDS into an area where Oracle cannot compete. This is because Oracle is not offering what AWS is offering. Oracle has a cloud DaaS service, but it is nothing like AWS or Google Cloud’s offering.
If customers chose to perform a homogeneous migration from their Oracle on-premises database to the RDS (or bring up a new Oracle RDS), then that customer still gets two of the three benefits listed above.
About RDS and Cloud SQL
The relational database management service or RDS and Cloud SQL offering are one of AWS’s and Google Cloud’s most popular services. Through this managed service, AWS and Google can concentrate database resources to manage enormous customer data centrally. In a new development in the history of databases. In either the on-premises model or in the hosted model (say with an IBM or CSC data center), each customer installs data into a database that they purchase that only contains their data. Then multiple database administrators employed at the customer manage those single customer database instances. The fundamental inefficiency of single-tenancy is a primary root of the inefficiency of the on-premises database modality. AWS RDS and Google Cloud SQL are not merely offering that same design but are hosted with them. Instead, RDS and Cloud SQL are multitenant. This means that each customer has their data in one database. However, the data can be protected by a VPC, but it still (through some fantastic technological developments that were previously unthinkable) resides in one database.
Therefore, while it seems like the customer data is altogether in the database, the VPC per customer means it is isolated. This is why Oracle and SAP’s pronouncements about “private cloud” are silly. Private cloud is Oracle and SAP’s way of cloud washing hosting, which is a method of moving a server from on-premises to the vendor’s premises. It’s not the cloud. One can obtain privacy from an efficient public cloud by using a VPC.
This design provides large economies of scale in database management. Also, it means that Oracle cannot compete with AWS or Google Cloud on both price and functionality. Oracle offers a sophisticated database, but its ability to manage its database in a cloud environment is vastly inferior to AWS or Google Cloud. This level of detail is rarely explored in public articles that cover this area. The distinctions in this area are generally lost on all but those that work in a hands-on capacity in the area.
Oracle prefers if the entire discussion around infrastructure be focused on the database rather than viewing the database as part of an overall infrastructure capability. Open-source or databases that are closed source but accessible with little license overhead (like those offered by AWS and Google Cloud) are part of what allows for infrastructure flexibility. Increasingly in the AWS and Google Cloud world, Oracle licensing and auditing is a liability.
An Opening for Open Source Databases in the Cloud
This is explained in the following quotation from Enterprise CIO.
“Why is using an open-source database particularly good for new projects? First, relational databases are not as differentiated as database vendors want everyone to believe. There are many databases that are secure, scalable, and well-supported in the industry. Second, because open-source databases are by definition not hampered by arcane licensing rules, companies can put them to work with lower license and support fees — or retire them — with little financial consequence. This speeds decision making, promotes application development, and encourages a nimble environment.”
If a company has its data in an open-source database like PostgreSQL or MariaDB, or even Aurora (which is not open source but lacks Oracle’s auditing department), there is an issue. They are at a distinct disadvantage concerning what they can do with both their databases and also their applications.
As bad as Oracle’s “advice” to companies has been, Oracle at least has respected, although highly self-centered, knowledge of databases. SAP’s advice to their customers has been far worse and far more self-centered, but branching into technically false assertions. For years SAP has been telling customers that they need to perform multiple database processing types from a single database. This is incorrect has not stopped either SAP or their partner network from saying it is true. SAP consulting partners waste no time repeating anything SAP says, and on SAP topics, the Internet primarily serves as an echo chamber where
SAP, SAP consulting firms, and media entities paid by SAP publish articles that say the same things. That is whatever SAP says. Overall the amount of independent information on SAP is vanishingly small.
We have covered in detail how SAP’s proposals about HANA have ended up being proven incorrect in articles ranging from What is HANA’s Actual Performance?, A Study into HANA’s TCO, to How Accurate Was Bloor Research on Oracle In-Memory?