Why Google’s Spanner Database Won’t Do As Well As Its Clone
February 15, 2017 Timothy Prickett Morgan
Google has proven time and again it is on the extreme bleeding edge of invention when it comes to scale out architectures that make supercomputers look like toys. But what would the world look like if the search engine giant had started selling capacity on its vast infrastructure back in 2005, before Amazon Web Services launched, and then shortly thereafter started selling capacity on its high level platform services? And what if it had open sourced these technologies, as it has done with the Kubernetes container controller?
The world would be surely different, and the reason it is not is because there is a lesson to be learned, one that applies equally well to traditional HPC systems for simulation and modeling as well as the Web application, analytics, and transactional systems forged by hyperscalers and cloud builders. And the lesson is this: Making a tool or system that was created for a specific task more general purpose and enterprise-grade – meaning mere mortals, not just Site Reliability Engineers at hyperscalers can make it work and keep it working – is very, very hard. And just because something scales up and out does not mean that it scales down, as it needs to do be appropriate for enterprises.
That, in a nutshell, is why it has taken nearly ten years since Google first started development of Spanner and five years from when Google released its paper on this globe-spanning, distributed SQL database to when it is available as a service on Google’s Cloud Platform public cloud, aimed at more generic workloads than its own AdWords and Google Play.
If this was easy, Google would have long since done it or someone else cloning Google’s ideas would have, and thus relational databases that provide high availability, horizontal scalability, and transactional consistency on a vast scale would be normal. They are not, and that is why the availability of Spanner on Cloud Platform is a big deal.
It would have been bigger news if Google had open sourced Spanner or some tool derived from Spanner, much as it has done with the guts of its Borg cluster and container controller through the Kubernetes project, and that may yet happen as Cockroach Labs, the New York City startup that is cloning Spanner much as Yahoo did with Google’s MapReduce to create Hadoop or the HBase and Cassandra NoSQL databases that were derived from ideas in Google’s BigTable NoSQL database.
To put it bluntly, it would have been more interesting to see Google endorse CockroachDB and support it on Cloud Platform, creating an open source community as well as a cloud service for its Cloud Platform customers. But, as far as we know, it did not do that. (We will catch up with the Cockroach Labs folks, who all came from Google, to see what they think about all this.) And we think that the groundswell of support for Kubernetes, which Google open sourced and let go, is a great example of how to build a community with momentum very fast.
For all we know, Google will eventually embrace CockroachDB as a service on Cloud Platform not just for external customers but for “internal” Google workloads as well, much as is starting to happen with Kubernetes jobs running on Cloud Platform through the Container Engine service among Googlers.
Spanning The Globe
Back in 2007, Google was frustrated by the limitations of its Megastore NoSQL and BigTable NoSQL databases, which were fine in that they provided horizontal scalability and reasonably fast performance, but Google wanted to also have these data services be more like traditional relational databases and also have them be geographically distributed for high availability and for maximum throughput on a set of global applications that also ran geographically. And so it embarked on a means to take BigTable, which had been created back in 2004 to house data stored for Google’s eponymous search engines as well as Gmail and other servers, and allow it to span global distances and still be usable as a single database for Google’s developers, who could care less about how a database or datastore is architected and implemented so long as it gets closer and closer to the SQL-based relational database that is the foundation of enterprise computing.
And, by the way, a pairing of relational data models and database schemas with the SQL query language that was invented by IBM nearly forty years ago and cloned by Oracle, Sybase, Informix, and anyone else you can think of including Postgres and MySQL. Moreover, IBM has been running clustered databases on its mainframes for as long as we can remember – they are called Parallel Sysplexes – and they can be locally clustered as well as geographically distributed and run a cluster of DB2 database instances as if there were one giant, logical database. Just like Spanner. Google databases like Spanner may dwarf those that can be implemented on IBM mainframes, but Google was not the first company to come up with this stuff. Contrary to what Silicon Valley may believe.
With any relational database, the big problem when many users (be they people or applications) is deciding who has access to the data and who can change that data as they are sharing the database. There are very sophisticated timestamping and locking mechanisms for deciding who has the right to change data and what that data is – these are the so-called ACID properties of databases. Google luminary Eric Brewer, who is vice president of infrastructure at Google and who helped create many of the data services at the search engine giant, coined the CAP Theorem back in 1998 and the ideas were developed by the database community in the following years. The gist of CAP Theorem is that all distributed databases have to worry about three things – consistency, availability, and partition tolerance – and no matter what you do, you can only have no more than two of these properties being fully implemented at any time in the datastore or database. Brewer explained this theorem in some detail in a blog post related to the Cloud Spanner service Google has just launched, and also explained that the theory is about having 100 percent of two of these properties, and that in the real world, as with NoSQL and NewSQL databases, the real issue is how you can get close enough to 100 percent on all three to have a workable, usable database that is reliable enough to run enterprise applications.
With Spanner, after a decade of work, Google has been able to achieve this. (You can read all about Spanner in the paper that Google release back in October 2012.) Part of the reason why Google can do this is because it has developed a sophisticated timestamping scheme for the globally distributed parts of Spanner that creates a kind of universal and internally consistent time that is synchronized by Google’s own network and is not dependent on outside services like the Network Time Protocol (NTP) that is used by servers to keep relatively in synch. Google needed a finer-grained control of timestamping with Spanner, so it came up with a scheme based on atomic clocks and GPS receivers in its datacenters that could provide a kind of superclock that spanned all of its datacenters, ordering transactions across the distributed systems. This feature, called TrueTime by Google, is neat, but the real thing that makes Google’s Spanner work at the speed and scale that it does is the internal Google network that lashes those datacenters to the same heartbeat of time as it passes.
Brewer said as much in a white paper that was published about Spanner and TrueTime in conjunction with the Cloud Spanner service this week.
“Many assume that Spanner somehow gets around CAP via its use of TrueTime, which is a service that enables the use of globally synchronized clocks. Although remarkable, TrueTime does not significantly help achieve CA; its actual value is covered below. To the extent there is anything special, it is really Google’s wide-area network, plus many years of operational improvements, that greatly limit partitions in practice, and thus enable high availability.”
The CA here refers to Consistency and Availability, and these are possible because Google has a very high throughput, global fiber optic network linking its datacenters with at least three links between the datacenters and the network backbone, called B1. This means that Spanner partitions that are being written to and that are trying to replicate data to other Spanner partitions running around the Google facilities have many paths to reach each other and eventually get all of the data synchronized – eventually being a matter of tens of milliseconds, not tens of nanoseconds like a port to port hop on a very fast switch and not hundreds of milliseconds, which is the time it takes for a human being to see an application moving too slow.
The important thing about Spanner is that it is a database with SQL semantics that allows reads without any locking of the database and massive scalability on local Spanner slices to thousands (and we would guess tens of thousands) of server nodes, with very fast replication on a global scale to many difference Spanner slices. When we pressed Google about the local and global scalability limits on the Cloud Spanner service, a Google spokesperson said: “Technically speaking, there are no limits to Cloud Spanner’s scale.”
Ahem. If we had a dollar for every time someone told us that. . . . What we know from the original paper is that Spanner was designed to, in theory, scale across millions of machines across hundreds of datacenters and juggle trillions of rows of data in its database. What Google has done in practice, that is another thing.
We also asked how many geographically distributed copies of the database are provided through the Cloud Spanner service, and this was the reply: “Everything is handled automatically, but customers have full view into where their data is located via our UI/menu.”
We will seek to get better answers to these and other questions.
The other neat thing about the paper that Brewer released this week is that it provided some availability data for Spanner as it is running inside of Google, and this chart counts incidents – unexpected things that happened – rather than failures – times when Spanner was unavailable itself. Incidents can cause failures, but not always, and Google claims that Spanner is available more than 99.999 percent (so called 5 9s) of the time.
As you can see from the chart above, the most frequent cause of incidents relating to Spanner running internally were user errors, such as overloading the system or not configuring something correctly; in this case, only that user is affected and everyone else using Spanner is woefully unaware of the issue. (Not my circus, not my monkeys. . . .) The cluster incidents, which made up 12.1 percent of Spanner incidents, were when servers or datacenter power or other components crashed, and often a Site Reliability Engineer is needed to fix something here. The operator incidents are when SREs do something wrong, and yes, that happens. The bugs, which are true software errors, presumably in Spanner code as well as applications, and Brewer said that the two biggest outages (meaning the time and impact) were related to such software errors. Networking errors for Spanner are when the network goes kaplooey, and it usually caused datacenters or regions with Spanner nodes to be cut off from the rest of the Spanner cluster. To be a CA system in the CAP Theorem categorization, the A has to be pretty good and not caused by the network partitions being an issue.
With under 8 percent of Spanner failures being due to network and partition issues and with north of 5 9s availability, you can make a pretty good argument that Spanner – and therefore the Cloud Spanner service – being a pretty good fuzzy CAP database, not just hewing to the CP definition that both Spanner and CockroachDB technically fall under.
Spanning The Google Cloud
The inside Spanner at Google underlies hundreds of its applications and petabytes of capacity and churns through tens of millions of queries per second, and it is obviously battle tested enough for Google to trust other applications on it besides its own.
At the moment, Cloud Spanner is only available as a beta service to Cloud Platform customers, and Google is not talking about a timeline for when it will be generally available, but we expect a lot more detail at the Next ’17 conference in early March that Google is hosting. What we know for sure is that Google is aiming Cloud Spanner at customers who are, like it was a decade ago, frustrated by MySQL databases that are chopped up into shards, as Google was using at the time as its relational datastore, as well as those who have chosen the Postgres path once Oracle bought MySQL. The important thing is that Spanner and now Cloud Spanner support distributed transactions, schemas, and DDL statements as well as SQL queries and JDBC drivers that are commonly used in the enterprise to tickle databases. Cloud Spanner has libraries for the popular languages out there, including Java, Node.js, Go, and Python.
As is the case with any cloud data service out there, putting data in is free, but moving it around different regions is not and neither would be downloading it off Cloud Platform to another service or a private datacenter, should that be necessary.