Sometimes the old ideas are the best ones, but they sometimes get ahead of their time. So it is with software containers, a kind of sandbox that runs on physical servers that is more ephemeral than a full-on virtual machine and yet allows a similar kind of flexibility when it comes to deploying and managing applications running on large-scale clusters.
Google has been working behind the scenes for the past decade developing software containers based on Linux cgroups and namespaces, features that it helped perfect and add to the Linux kernel, and created a cluster and container management system, called Borg, that could take these abstracted compute elements, package them together as necessary, and deploy them across the vast fleet of systems that Google has to run its search engine, ad serving engines, and myriad other applications.
Rather than keep this technology to itself, or just write a few papers about containers and Borg that show what can be done and to what effect, Google last year first started talking about its use of containers, explaining that it was firing up 2 billion software containers a week on its infrastructure, and soon thereafter it announced that it was open sourcing a more generic version of the container scheduling portions of Borg called Kubernetes.
This week, Kubernetes 1.0, the first production-grade implementation of the container management tool that is largely used with Docker containers today, is available and at the same time Google is handing over control of the Kubernetes project to a new organization called the Cloud Native Computing Foundation, which will be administered by the Linux Foundation and which will allow Kubernetes to be steered by a large number of IT vendors and corporate users outside of Google.
The formation of the CNCF by Google and other Kubernetes enthusiasts follows fast on the heels of the formation of the Open Container Project, which was created a little more than a month ago to keep the basic format of a software container from forking and thereby undoing two of the key benefits from containers – standardization and portability. CoreOS, which makes a Linux distribution that only exposes services in containers, was unhappy with the security model and complexity that Docker, the provider of the leading container format at this point, was adding to its eponymous container stack. And so in late 2014 it went off on its own and created what it viewed as a simpler and more secure container format called AppC and implemented it as a rkt container (pronounced “rocket”) in its CoreOS Linux and Tectonic container management stack. (Tectonic, launched earlier this year, mixes rkt containers with the open source Kubernetes container manager and scheduler created by Google and a number of other tools that CoreOS created to run containers at scale.)
With the container format issue working towards resolution – it will take months to resolve the differences in the container formats and management hooks for their runtimes – and both CoreOS and Docker standing behind a future unified software container standard, the timing for the launch of commercial-grade Kubernetes could not be more auspicious and the creation of the CNCF will make enterprises, cloud builders, and maybe even Google’s hyperscale rivals take a hard look at the evolving commercial container stack and throw their weight behind the open source tools rather than the proprietary ones already deployed by Amazon Web Services or being developed by Microsoft and VMware for their public clouds as well as private infrastructure run by their many hundreds of thousands of enterprise customers in their own datacenters.
Why Containers Will Transform Datacenters
If you want to see the future of data processing – the term information technology always seemed a lot less precise – it is helpful to look at what the hyperscalers like Google and the supercomputer centers of the world are doing. (This is one of the founding premises of The Next Platform, after all.) Google has been doing containers in production longer than anyone, but strictly speaking, such an idea has been around for a long time in the Unix world, and arguably subsystems in IBM’s mainframe and minicomputer platforms from the dawn of time had resource isolation and pegging that is similar, in concept, to a software container. It took Linux a while to get containers, and the magical thing that Google learned – and is now teaching others by opening up Kubernetes – is that the orchestration of containers matters as much as the containers themselves when it comes to squeezing more efficiencies out of infrastructure and gaining efficiencies in application development and deployment.
“If your developers are spending time thinking about individual machines, you are operating at too low level of an abstraction. You want to operate at the level of applications and let the system take care of the scheduling of the applications. That means your developers can move faster, you can ship faster, you can iterate faster, your business grows faster – speed improves everything.”
At the OSCON 2015 conference in Portland, Oregon today, some of the top brass at Google explained the benefits that the search engine giant derived from its move away from running applications on physical hardware and towards software containers based on cgroups, which have inspired the Docker and rkt containers to a large extent. The first one is scale.
“There are multiple dimensions to scalable,” Greg DeMichillie, director of product management for Google Cloud Platform, explained in his keynote. “When I say scalable, most of you probably think hardware scalability, which is means as I double the number of servers, the system just works. And that is absolutely true. One of the aspects of a production container system is scalability. But the other thing is that it allows us to scale our fleet much faster than we scale our team. At Google, we measure fleet sizes in terms of megawatts, and if you look at the period where we switched over to container technology, we were able to grow our fleet size roughly an order of magnitude faster than we had to grow the operational people that took care of the fleet. And the reason is that they were freed from the much of the boring work of allocating resources to groups or jobs, or restarting jobs when hardware failed – that was all taken care of by our container management system.”
Such a container management system also, says DeMichillie, provides a standard foundation for checkpointing and monitoring. The other big benefit, he explained, is portability, which may seem like an odd thing to worry about at a company that has fairly homogenous hardware and software infrastructure that it absolutely controls down to the screws and bits. But Google, DeMichillie said, had four different environments – development, test, staging, and production – and being able to know that if something worked in one environment would work in the others radically sped up the total application development and deployment cycle. But here is the kicker and the feedback loop part: Once Google standardized on containers, it no longer had to allocate specific clusters to the dev, test, staging, and production aspects of its software stack for each of the myriad applications that comprise Google. Once containers were implemented, all iron became essentially compatible and all workloads could be scaled up and down as needed.
“This allows us to run many more workloads on fewer machines, because we can take advantage of slack in the environment, and we don’t have to worry that this software was built for that generation of system,” DeMichillie said. “If your developers are spending time thinking about individual machines, you are operating at too low level of an abstraction. You want to operate at the level of applications and let the system take care of the scheduling of the applications. That means your developers can move faster, you can ship faster, you can iterate faster, your business grows faster – speed improves everything.”
Interestingly, DeMichillie says that along the way towards developing an automated and virtualized stack for running its applications – which now includes Compute Engine and App Engine, which are also containerized and Compute Engine has VMs on top of those containers – Google has made just about every mistake you could make. “We have stubbed our toes, and along the way, we have learned the architecture and the patterns that work. And that is what Kubernetes is all about.”
The initial Kubernetes that was open sourced last summer was built by the same Google engineers who created and maintain the internal Borg system, and in many respects, creating Borg is an easier task. Again, Google’s infrastructure is fairly homogeneous and its software stack is its own, right down to the Linux distribution. But in moving to the public cloud, the workloads are different and more diverse running inside of those VMs (which in turn can be running Docker containers), and as Kubernetes is deployed on other public and private clouds with a plethora of different systems and running untold different applications, Google clearly needed some help.
That was why it open sourced Kubernetes and that is why it has worked with its peers – including AT&T, Box, Cisco Systems, Cloud Foundry, CoreOS, Cycle Computing, Docker, eBay, Goldman Sachs, Huawei Technology, IBM, Intel, Joyent, Kismatic, Mesosphere, Red Hat, Switch SuperNAP, Twitter, Univa, VMware, and Weaveworks – to create the CNCF that is now tucked up under the Linux Foundation. Google can’t scale Kubernetes as well as a community of enlightened self-interested parties can.
But what does Google get out of Kubernetes? The company has so rarely open sourced software technologies, and Linux containers and Kubernetes are the big exceptions in the enterprise. Why? For one thing, Google wants to look like the open alternative on the public cloud, and Microsoft and AWS may be supporting Docker containers (and in the future the open container format that is created), but their respective management tools will not be.
Another thing that Google gets is some leverage among bleeding edge Docker container adherents that want to run the same management tool on their internal private clouds as they use on their public cloud provider. Google’s own Container Engine (abbreviated GKE so as to not be confused with Google Compute Engine, commonly shortened to GCE) has just entered beta last month and has a freebie version for managing containers on up to five virtual machines and a priced version that costs 15 cents per hour for up to 100 virtual machines. Presumably there will be a management console extension that allows companies to manage container images across Compute Engine and private clouds, much as the Eucalyptus cloud controller (now owned by Hewlett-Packard) provides API compatibility with the management console for the EC2 compute cloud at AWS.
By opening up Kubernetes and letting it go, Google also gets a bevy of system and container management experts outside of its own four walls who think and do container management like it does, and more importantly, an even broader base of application developers who understand the way that Kubernetes plus Docker or rkt containers automates the deployment of applications. Not only that, but a slew of potential competitors, such as CoreOS or Mesosphere, who are also trying to bring the Google Way of container management to their respective Linux and bare metal cluster management tools, are now allies instead of enemies.
It will be interesting to see what the forthcoming open container format will look like – we expect it to be fairly faithful to Docker, given its overwhelming popularity – and what happens with the Kubernetes community that is forming. The container management toolchain will require more than Kubernetes as delivered today, and Google knew that, and so does Red Hat along with CoreOS and Docker, who all have commercial-grade container management systems now in various stages of development.
The Tectonic tool from CoreOS is in preview starting today, with a cost of $1,500 per month per site for tech support during the preview. CoreOS tells The Next Platform that after the preview is over and Tectonic becomes generally available, CoreOS will shift to a pricing model that scales based on the amount of main memory supporting containerized applications. Red Hat’s Atomic Host, which integrates Docker containers and Kubernetes with Enterprise Linux 7, was launched back in March, and Atomic Enterprise Platform, the full stack that has a variant of the OpenShift platform cloud embedded, comes out later this year. Red Hat has not supplied pricing. Mesosphere has also integrated Kubernetes into its Data Center Operating System, which was inspired by Borg as well. A support contract for the Enterprise Edition of DCOS costs on the order of thousands of dollars per node per year.