Is Amazon’s Database Strategy A Glimpse Into The Future?

IBM, through the work of Edgar Codd, invented the ideas behind the relational database back in 1970. And even though IBM Research created the System R database in 1974 and had a few customers for this research effort, and even though its “Pacific” project integrated a relational database into an object-based operating system to create the System/38 back in 1978 – a very advanced machine for its day that still lives on in the Power Systems family running the IBM i operating system – it is Oracle that gets credit for commercializing the relational database.

Perhaps that is justified because IBM didn’t really sell more than a few tens of thousands of System/38s in their decade of availability and moreover Big Blue did not get a relational database, called SQL/DS, out on mainframes until 1981 and did not get its flagship DB2 database running on mainframes until 1983. Oracle, founded in 1977 and called Software Development Labs at the time, built a database management system for the US Central Intelligence Agency code-named “Oracle” using relational concepts. Eventually the company and the products adopted this name, and that RDBMS has evolved over time, scaling up and out and in with all kinds of new technologies like in-memory and columnar data store, just to name a few important recent ones that make Oracle relevant.

The Oracle database, thanks to the use of triggers and stored procedures that program parts of applications inside the database rather than external to them and also thanks to the vast experience that database administrators have in using this database, is perhaps the stickiest technology in the datacenter. And that includes the datacenters of the hyperscalers and cloud builders that operate on vast scales. While these companies create all kinds of interesting and scalable platforms, including databases, every single one of them have systems that run their businesses that are based on third party code, and many of them are using Oracle databases underneath those systems. Amazon and Apple, for instance, all run ERP software from SAP to collect the money and pay the bills, and we have seen Teradata data warehouse equipment with our own eyes at Apple; Google apparently uses Oracle’s E-Business Suite.

Amazon is a bit different from its hyperscale and cloud builder peers in that its vast retail and warehouse operations have long since been underpinned by homegrown applications running atop the Oracle database.

In August last year, Amazon vowed to move all of its applications off internal Oracle databases and onto various database services running on the Amazon Web Services public cloud. Such projects are very difficult to do, particularly for companies that have applications and databases that have been in the field for one, two, or more decades. Not surprisingly, in an interview last year, Oracle co-founder and executive chairman Larry Ellison said that Amazon was spending $50 million a year on Oracle database licenses, and made fun of the company when the port of its warehouse operations crashed after those applications were shifted to AWS database services. But at that point, the jib was already up and Amazon was going to do anything it took to get rid of Oracle databases from its business operations.

Much is being made this week about Amazon’s announcement that the applications in its consumer-facing businesses have all been ported to run on AWS database services – including relational, document, graph, key-value, in-memory, and data warehouse databases – and the company has unplugged nearly 7,500 Oracle databases with an aggregate of 75 PB of information stored in them. That is an awful lot of business data, especially when you consider that the database – meaning literally the database tables with the information, not the database management system itself – spans maybe several hundreds of gigabytes to a few terabytes of data in a typical large enterprise.

Incidentally, if you read the announcement of Oracle getting the boot from Amazon datacenters carefully, Amazon says that the porting of these databases to DynamoDB, Aurora, Relational Database Service, and Redshift “covered 100 percent” of Amazon’s proprietary systems, including those that manage purchasing, catalog management, order fulfillment, accounting, and video streaming workloads. But some third party applications – presumably the SAP suite but maybe including other programs – were “tightly bound to Oracle and were not migrated.”

So, technically, Amazon is still going to be paying for Oracle database licenses.

Enterprise IT shops will definitely pay a premium for a premium product, and as long as they are not bumping up hard against performance ceilings and have room to deal with peak production workloads, Oracle has done a pretty good job maintaining its lock on the enterprise relational database market. It is incredibly difficult to move from one database to another, and the very high prices and rich margins that Oracle gets for its eponymous database has made it a target for decades. Many have tried to topple Oracle, but few have made much of a dent.

But Amazon is an example of a company that did move a lot of its applications off Oracle and is claiming to save vast funds of money, and it was the company’s own turn to poke some fun at Oracle. First off, AWS said that customers who switch from Oracle databases (presumably on on-premises applications) and move to AWS database services for applications moved up to the public cloud see a 90 percent savings. As a very large user of database processing, Amazon was able to get a “heavily discounted rate” for database services from AWS – exactly what percent that discount was has not been divulged – and then it was able to save another 60 percent on top of that – for reasons that were not explained at all.

Significantly, Amazon says that its latency of consumer-facing applications was cut by 40 percent and that administrative overhead for databases was cut by 70 percent because the database services on AWS are managed. At some level, even with automation, there is presumably still a lot of manual work being done by smart DBAs underneath those data services on AWS, so what is really happening is that Amazon is just shifting from its own operating expense budget and loading up the AWS people with some extra work, with automation taking some of the burden.

Over a hundred teams in the Amazon consumer business – external services and products that we are familiar with such as Alexa, Amazon Prime, Amazon Prime Video, Amazon Fresh, Kindle, Amazon Music, Audible, Shopbop, Twitch, and Zappos – as well as internal divisions we don’t see from the outside such as AdTech, Amazon Fulfillment Technology, Consumer Payments, Customer Returns, Catalog Systems, Deliver Experience, Digital Devices, External Payments, Finance, InfoSec, Marketplace, Ordering, and Retail Systems – participated in the move away from Oracle. That’s an average of around 75 database per division, which sounds about right. Databases proliferate like crazy and applications tend to be siloed, so these databases also tend to run on distinct servers or clusters of servers if they have very high transaction throughput requirements. It would be interesting to know how many physical machines – and how much compute and storage capacity they required – were unplugged in the Amazon datacenters, and how much compute and storage capacity they needed when translated to the AWS database services. A breakdown of what type of application and what features in the Oracle database it used moved to what AWS services would have been illustrative, too. There isn’t so much a thing as the Oracle database as the many personalities that are within or grafted onto the Oracle database.

The interesting thing here for us is to consider how many companies want to follow Amazon’s lead. With Amazon, this is a personal thing: You can’t expect your customers to depend on the AWS database services if you don’t do it yourself. So Ellison poking at AWS for the past few years is not just good fun, it is a legitimate criticism. And now AWS has taken all of the air out of that argument, and as far as we know, there have not been any major outages during the shift in services – that initial physical warehouse example cited by Ellison in that interview from October 2018 excepted, of course.

The lesson learned here, perhaps, is that having a universal database that can do everything but perhaps operate at hyperscale is not as important as picking the right database for the job, each and every time a job comes up. The fact is, databases proliferate like crazy, both in number and in flavor. The operational and volume discount benefits of having one database, like the Oracle database, have to outweigh the lower cost, better scalability, and often higher performance of alternatives for specific kinds of work.

We think the math might be shifting away from the universal database to a collection of highly tuned ones. That is why despite all of the efforts by Oracle, IBM, and Microsoft to make their databases all things to all people, new database upstarts keep popping up all of the time. And we suspect there will be more in the future, and a greater tendency to use them by enterprises of all shapes and sizes as Oracle, IBM, and Microsoft continue to charge a high premium for those products.

But make no mistake about it. IBM’s mainframes have been in the field for more than five decades after countless alternatives have been introduced into the market and that, by some measures at least but certainly not in all measures, are superior in important aspects such as cost, scalability, performance, and resilience. It will be a long time, perhaps a decade or two, before the last IBM mainframe is unplugged – and we are not even sure about that prediction. But what we are sure of is that Oracle databases will outlive the mainframe, whenever the last one of these venerable card-wallopers does go off into the sunset. And perhaps for a very, very long time after that. And the reason is that it is very easy to add databases, but it is very hard to subtract them.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.

Subscribe now

3 Comments

  1. Great article, focused on database management operations, the proliferation of data.. the evolution of technology.. a database for every application.. sounds awesome.. trust that you follow a common vocabulary, if you want to minimise the cost to integrate and interoperate.

  2. It looks like. We journey from Hierarchical,network, RDBMS, ORDBMS, MPP, (KV{cache, stored, ordered, data structure, tuple, object}, columnar, document, graph{LPG,RDF}), multi-model native or non-native. Categorized as NoSQL, NewSQL, HTAP on higher level. GPU databases of various types addressing RDBMS, RDF liek Brytlyt, BlazingDB, Blazegraph, kinetica, omniscience, sqream and few others . There are quite a number of databases pressurizing Oracle from all angles like MariaDB, PostgreSQL variations like enterprisedb, MySQL with varied engines and many others to talk of. So Oracle , though closest to 13 Codd’s rules, can’ t compete. Amazon has plenty of databases covering many areas. Snowflake solution is one example that eclipses the thought of stucky MPP by provisioning shared-disk architecture but MPP compute.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.