It seems like a question a child would ask: “Why are things the way they are?”
It is tempting to answer, “because that’s the way things have always been.” But that would be a mistake. Every tool, system, and practice we encounter was designed at some point in time. They were made in particular ways for particular reasons. And those designs often persist like relics long after the rationale behind them has disappeared. They live on – sometimes for better, sometimes for worse.
A famous example is the QWERTY keyboard, devised by inventor Christopher Latham Sholes in the 1870s. According to the common account, Latham’s intent with the QWERTY layout was not to make typists faster but to slow them down, as the levers in early typewriters were prone to jam. In a way it was an optimization. A slower typist who never jammed would produce more than a faster one who did.
New generations of typewriters soon eliminated the jamming that plagued earlier models. But the old QWERTY layout remained dominant over the years despite the efforts of countless would-be reformers.
It’s a classic example of a network effect at work. Once sufficient numbers of people adopted QWERTY, their habits reenforced themselves. Typists expected QWERTY, and manufacturers made more QWERTY keyboards to fulfill the demand. The more QWERTY keyboards manufacturers created, the more people learned to type on a QWERTY keyboard and the stronger the network effect became.
Psychology also played a role. We’re primed to like familiar things. Sayings like “better the devil you know” and “If it ain’t broke, don’t fix it,” reflect a principle called the Mere Exposure effect, which states that we tend to gravitate to things we’ve experienced before simply because we’ve experienced them. Researchers have found this principle extends to all aspects of life: the shapes we find attractive, the speech we find pleasant, the geography we find comfortable. The keyboard we like to type on.
To that list I would add the software designs we use to build applications. Software is flexible. It ought to evolve with the times. But it doesn’t always. We are still designing infrastructure for the hardware that existed decades ago, and in some places the strain is starting to show.
The Rise And Fall Of Hadoop
Hadoop offers a good example of how this process plays out. Hadoop, you may recall, is an open-source framework for distributed computing based on white papers published by Google in the early 2000s. At the time, RAM was relatively expensive, magnetic disks were the main storage medium, network bandwidth was limited, files and datasets were large and it was more efficient to bring compute to the data than the other way around. On top of that, Hadoop expected servers to live in a certain place – in a particular rack or data center.
A key innovation of Hadoop was the use of commodity hardware rather than specialized, enterprise-grade servers. That remains the rule today. But between the time Hadoop was designed and the time it was deployed in real-world applications, other ‘facts on the ground’ changed. Spinning disks gave way to SSD flash memory. The price of RAM decreased and RAM capacity increased exponentially. Dedicated servers were replaced with virtualized instances. Network throughput expanded. Software began moving to the cloud.
To give some idea of the pace of change, in 2003 a typical server would have boasted 2 GB of RAM and a 50 GB hard drive operating at 100 MB/sec, and the network connection could transfer 1Gb/sec. By 2013, when Hadoop came to market, the server would have 32 GB RAM, a 2 TB hard drive transferring data at 150 MB/sec, and a network that could move 10 Gb/sec.
Hadoop was built for a world that no longer existed, and its architecture was already deprecated by the time it came to market. Developers quickly left it behind and moved to Spark (2009), Impala (2013), Presto (2013) instead. In that short time, Hadoop spawned several public companies and received breathless press. It made a substantial –albeit brief – impact on the tech industry even though by the time it was most famous, it was already obsolete.
Hadoop was conceived, developed, and abandoned within a decade as hardware evolved out from under it. So it might seem incredible that software could last fifty years without significant change, and that a design conceived in the era of mainframes and green-screen monitors could still be with us today. Yet that’s exactly what we see with relational databases.
The Uncanny Persistence Of RDMBS
In particular, the persistence is with the Relational Database Management System, or RDBMS for short. By technological standards, RDBMS design is quite old, much older than Hadoop, originating in the 1970s and 1980s. The relational database predates the Internet. It comes from a time before widespread networking, before cheap storage, before the ability to spread workloads across multiple machines, before widespread use of virtual machines, and before the cloud.
To put the age of RDBMS in perspective, the popular open source Postgres is older than the CD-ROM, originally released in 1995. And Postgres is built on top of a project that started in 1986, roughly. So this design is really old. The ideas behind it made sense at the time, but many things have changed since then, including the hardware, the use cases and the very topology of the network,
Here again, the core design of RDBMS assumes that throughput is low, RAM is expensive, and large disks are cost-prohibitive and slow.
Given those factors, RDBMs designers came to certain conclusions. They decided storage and compute should be concentrated in one place with specialized hardware and a great deal of RAM. They also realized it would be more efficient for the client to communicate with a remote server than to store and process results locally.
RDBMS architectures today still embody these old assumptions about the underlying hardware. The trouble is those assumptions aren’t true anymore. RAM is cheaper than anyone in the 1960s could have imagined. Flash SSDs are inexpensive and incredibly responsive, with latency of around 50 microseconds, compared with roughly 10 milliseonds for the old spinning disks. Network latency hasn’t changed as much – still around 1 millisecond – but bandwidth is 100 times greater.
The result is that even now, in the age of containers, microservices, and the cloud, most RDBMS architectures treat the cloud as a virtual datacenter. And that’s not just a charming reminder of the past. It has serious implications for database cost and performance. Both are much worse than they need to be because they are subject to design decisions made 50 years ago in the mainframe era.
Obsolete Assumption: Databases Need Reliable Storage
One of the reasons relational databases are slower than their NoSQL counterparts is that they invest heavily in keeping data safe. For instance, they avoid caching on the disk layer and employ ACID semantics, writing to disk immediately and holding other requests until the current request has finished. The underlying assumption is that with these precautions in place, if problems crop up, the administrator can always take the disk to forensics and recover the missing data.
But there’s little need for that now – at least with databases operating in the cloud. Take Amazon Web Services as an example. Its standard Elastic Block Storage system makes backups automatically and replicates freely. Traditional RDBMS architectures assume they are running on a single server with a single point of storage failure, so they go to great lengths to ensure data is stored correctly. But when you’re running multiple servers in the cloud – as you do – if there’s a problem with one you just fail over to one of the healthy servers.
RDBMSs go to great lengths to support data durability. But with the modern preference for instant failover, all that effort is wasted. These days you’ll failover to a replicated server instead of waiting a day to bring the one that crashed back online. Yet RDBMS persists in putting redundancy on top of redundancy. Business and technical requirements often demand this capability even though it’s no longer needed – a good example of how practices and expectations can reinforce obsolete design patterns.
Obsolete Assumption: Your Storage Is Slower Than Your Network
The client/server model made a lot of sense in the pre-cloud era. If your network was relatively fast (which it was) and your disk was relatively slow (which it also was), it was better to run hot data on a tricked-out, specialized server that received queries from remote clients.
For that reason, relational databases originally assumed they had reliable physical disks attached. But once this equation changed, and local SSDs could find data faster than it could be moved over the network, it made more sense for applications to read data locally. But at the moment we can’t do this because it’s not how databases work.
This makes it very difficult to scale RDBMS, even with relatively small datasets, and makes performance with large data sets much worse than it would be with local drives. This in turn makes solutions more complex and expensive, for instance by requiring a caching layer to deliver the speed that could be obtained cheaper and easier with fast local storage.
Obsolete Assumption: RAM Is Scarce
RAM used to be very expensive. Only specialized servers had lots of it, so that is what databases ran on. Much of classic RDBMS design revolved around moving data between disk and RAM.
But here again, the cloud makes that a moot point. AWS gives you tremendous amounts of RAM for a pittance. But most people running traditional databases can’t actually use it. It’s not uncommon to see application servers with 8 GB of RAM, while the software running on them can only access 1 GB, which means roughly 90 percent of the capacity is wasted.
That matters because there’s a lot you can do with RAM. Databases don’t only store data. They also do processing jobs. If you have a lot of RAM on the client, you can use it for caching, or you can use it to hold replicas, which can do a lot of the processing normally done on the server side. But you don’t do any of that right now because it violates the design of RDBMS.
How (And Why) To Smash A Relic
Saving energy takes energy. But software developers often choose not to spend it. After all, as the inventor of Perl liked to say, laziness is one of the great programmer’s virtues. We’d rather build on top of existing knowledge than invent new systems from scratch.
But there is a cost to taking design principles for granted, even if it is not a technology as foundational as RDBMS. We like to think that technology always advances. RDBMS reminds us some patterns persist because of inertia. They become so familiar we don’t see them anymore. They are relics hiding in plain sight.
Once you do spot them, the question is what to do about them. Some things persist for a reason. Maturity does matter. You need to put on your accountant’s hat and do a hard-headed ROI analysis. If your design is based on outdated assumptions, is it holding you back? Is it costing you more money than it would take to modernize? Could you actually achieve a positive return?
It’s a real possibility. Amazon created a whole new product – the Aurora database – by rethinking the core assumptions behind RDBMS storage abstraction.
You might not go that far. But where there’s at least a prospect of positive ROI, it’s a good sign that change is strategic. And that’s your best sign that tearing down your own design is worth the cost of building something new in its place.
Avishai Ish-Shalom is developer advocate at ScyllaDB.