Search engine and cloud computing juggernaut Google is hosting its Google Cloud Next ’21 conference this week, and one of the more interesting things that the company unveiled is several layers of software that makes its Spanner globally distributed relational database look and feel like the popular open source PostgreSQL relational database.
Making the Cloud Spanner implementation of Spanner – meaning not the version that Google uses itself for its internal workloads like ad serving and data analytics, but the one that is exposed through the Google Cloud and sold as a service – look and feel like PostgreSQL immediately opens up Cloud Spanner to a much wider variety of customers, particularly in the enterprise. It is unclear if the internal Spanner version used by Google for its own work also supports PostgreSQL in the same fashion, but it makes sense because PostgreSQL is fast becoming the interface of choice for developers who choose open source relational databases that sit behind the applications that they create.
The days when MySQL was the default choice – and the interface that a lot of early NoSQL and NewSQL databases coded to when they were created a decade or more ago – are waning, and that very much has to do with Oracle’s $7.4 billion acquisition of Sun Microsystems in earlier 2010. Two years prior to that, Sun paid $1 billion to buy MySQL, which was pretty much the leading open source relational database and something of a threat to all providers of relational databases, and the key vendors threatened were Oracle, Microsoft, and IBM. Once Oracle had control of MySQL and the base forked with MariaDB, attention turned to the PostgreSQL database as developers looked for a new Switzerland on which to create their applications. And thus the rise of PostgreSQL was pretty much guaranteed.
There is still, mind you, a lot of MySQL in the world, which is why Google launched its Cloud SQL implementation of MySQL on the Google Cloud back in 2011. Google has subsequently launched managed services implementations of PostgreSQL and Microsoft SQL Server under the Cloud SQL brand, and in the case of the PostgreSQL variant, it is real PostgreSQL under the hood, which means the service has the same scalability limits that PostgreSQL customers often wrestle with but it has the virtue of providing 100 percent compatibility with the open source variant of the PostgreSQL database.
This is not so with the PostgreSQL layer on Cloud Spanner, and is probably not true of the PostgreSQL layer that Google supports internally on the real Spanner database – if it does indeed support PostgreSQL at all.
In a blog post, Justin Makeig, product manager for Cloud Spanner, said that when it comes to the PostgreSQL layer on top of Cloud Spanner, don’t expect universal compatibility for PostgreSQL features.
“This preview release is the first in a much larger, long-term investment in making Spanner more open and accessible,” Makeig explained. “Initially, the Spanner PostgreSQL interface supports a core subset of the capabilities that PostgreSQL offers. By design, these align with the current features of Spanner that power a wide variety of mission-critical applications in production today. Queries and schemas that use the PostgreSQL interface will have the same semantics as other PostgreSQL environments. 100 percent PostgreSQL compatibility is not the goal. We’ve focused on familiarity and portability, providing easier access to Spanner’s consistency and availability at scale without reducing deployment flexibility.”
That is an important distinction, but to be fair, Google is doing more than adding support for the PostgreSQL wire protocol to its database, and several databases, including the Spanner-inspired CockroachDB, have supported from Day One.
The integration of PostgreSQL into Cloud Spanner is deep; it is not just some conversion overlay. At the database schema level, the PostgreSQL interface for Cloud Spanner supports native PostgresSQL data types and its data description language (DDL), which is a syntax for creating users, tables, and indexes for databases. The upshot is that if you write a schema for the PostgreSQL interface for Cloud Spanner is that it will port to and run on any real PostgreSQL database, which means customers are not trapped on the Google Cloud if they use this service in production and want to switch. But customers do have to be careful. Spanner functions, like table interleaving, have been added to the PostgreSQL layer because they are important features in Spanner. You can get stuck because of these. But Makeig says that Google has tried to minimize these exceptions so customers don’t feel locked in. (They will decide how locked in they feel themselves, and possibly after it is too late if they are not careful.)
The PostgreSQL interface for Cloud Spanner compiles PostgresSQL queries down to Spanner’s native distributed query processing and storage primitives and does not just support the PostgreSQL wire protocol, which allows for clients and myriad third-party analytics tools to interact with the PostgreSQL database. That latter bit is the important thing for the enterprise customers that Google is trying to attract to its cloud. Companies have applications built on PostgreSQL itself or third party data analytics and dashboard applications that rely on the PostgreSQL protocol, and by supporting this layer the PostgreSQL databases and their applications and adjunct programs can all be moved to the Google Cloud. Which is what this is mostly about.
That said, Google also wants to attract new developers who are coding new applications to its eponymous cloud and the cloudy variant of its Spanner database, and the best way to do that is to support the PostgreSQL interface. This way, these applications can start on the Cloud SQL for PostgreSQL service until they go beyond its fairly limited scale (the limit is not one Google is artificially imposing, it is one inherent to open source databases like MySQL and PostgreSQL. For big database jobs, it is not a coincidence that companies stick with Oracle, DB2, or SQL Server and plunk it down, rather heavily, onto big fat NUMA servers. Horizontal scale for a true relational database that can do OLTP and OLAP at the same time is not trivial, which is why Google created Spanner to do a better job than its Megastore and Bigtable NoSQL databases in the first place.
Google is putting the PostgreSQL interface into preview now, and says that it has very little overhead compared to Cloud Spanner, and offers the same 99.999 percent availability guarantee as well as the same data consistency and security.
We still think that Google should go all the way and open source Spanner, like it did Kubernetes, but it will be a very cold day before that happens. Spanner is the very data glue that holds Google together, and is much more important today than the MapReduce data analytics method and the Google File System underpinning it was unveiled in 2004, inspiring Hadoop to come into being.
Great read, just need to correct typo…
9 paragraphs in “…which means customers are not trapped on the Google Cloud is they use this service in production and want to switch…”
I believe it should read…
…if they use…