A few weeks ago, I put together a list of the elements of what I considered a production software stack, or platform, for running modern applications based on my experiences on how to break down these problems after having spent ten years at Google.
A lot of the feedback I got on that list is that it is a very opinionated list in terms of basing things on top of containers. It is definitely not a one size fits all list, but it does point the way to the things that you do need, in some shape or form, to build a scalable, modern production platform.
A modern platform is more than buying a VM and throwing Apache on there and calling it a day. That works for a certain level, but when you are talking about scalable, first order applications, you start to need a set of higher order tools.
I have a few key prerequisites for this production software stack, and I realize fully that there are many different ways to build such stacks. This is one of the benefits of the open source community and modular software design. This list is meant to illustrate the elements of the stack, and is not necessarily an endorsement of the software itself; it certainly does not mean that I have necessarily tested all of these pieces, or that they all work harmoniously together.
Generally speaking, the modern production stack needs to meet a few requirements. It has to be self healing and self managing, which means that, if a machine fails, operations staff do not have to think about it. The stack should also support microservices, which allows for the software engineering organization to scale because each piece can be supported by a “two-pizza” team. The stack needs to be efficient in terms of the amount of resources it needs – both human and computer – and it has to be debuggable, which means having application-specific monitoring, log collection, and log aggregation. Here is the list of the components, off the top of my head and certainly not meant to be exhaustive:
Production Host OS: CoreOS, Red Hat Project Atomic, Ubuntu Snappy, Rancher OS.
Bootstrapping System: Puppet, Chef, Ansible, and Salt can serve this role; Cloud Foundry BOSH was created to do this for Cloud Foundry, but is seeing new life as an independent product. CoreOS Fleet is a lightweight clustering system that can also be used to bootstrap more comprehensive solutions.
Container Engine: Docker Engine, CoreOS rkt, and LXC and systemd-nspawn. Some of these systems are more amenable to being directly controlled remotely than others. The Open Container Initiative is working to standardize the input into these systems.
Container Image Packaging & Distribution: Both Docker and CoreOS rkt solve this problem. It is built into the Docker Engine but is broken out for rkt as a separate tool set called acbuild. Inside of Google this was done slightly differently with a file package distribution system called MPM. Hopefully we can define a widely adopted spec for this, hopefully as part of the OCI.
Container Image Registry/Repository: Hosted versions of this include the Docker Hub, Quay.io (owned by CoreOS), and Google Container Registry. Docker also has an open source registry called Docker Distribution. Hopefully, the state of the art will evolve past centralized solutions with specialized APIs to solutions that are simpler by working regular HTTP and more transport agnostic so that protocols like BitTorrent can be used to distribute images.
Container Distribution: Many people don’t talk about this as a separate thing but instead reuse OS distributions such as Ubuntu, Debian, or CentOS. Many folks folks are working to build minimal container distributions by either using distributions based in the embedded world (BusyBox or Alpine) or by building static binaries and not needing anything else.
Container Orchestration: Open source deployable examples include Kubernetes, Docker Swarm, and Apache Mesos. Hosted systems include Google Container Engine (based on Kubernetes), Mesosphere DCOS, Microsoft Azure Container Service, and Amazon EC2 Container Service (ECS).
Orchestration Config: AWS CloudFormation and Google Cloud Deployment Manager play this role for their respective cloud ecosystems (only). Hashicorp Terraform and Flabbergast look like they could be applied to container orchestration systems but haven’t yet. Docker Compose is a start to a more comprehensive config system. The Kubernetes team has lots of ideas and plans for this area.
Network Virtualization: CoreOS Flannel, Weave, Project Calico, and Docker libnetwork (not ready for production yet), and OpenContrail.
Container Storage: ClusterHQ Flocker, Blockbridge
Discovery Service: DNS is often used, many build on top of highly consistent stores (lock servers). Examples include: Apache Zookeeper, CoreOS etcd, Hashicorp Consul. Kubernetes supports service definition and discovery (with a stable virtual IP with load balanced proxy). Weave has a built in DNS server.
Production Identity & Authentication: This is not a well understood component of the stack; conjur.net is a commercial offering. Microservice authentication, or “authentity.”
Monitoring: Prometheus looks very interesting. Grafana frontend plus InfluxDB or OpenTSDB. Heapster is a container specific monitoring system that surfaces data collected by cAdvisor. Hosted systems such as Google Cloud Monitoring and SignalFx.
Logging: Systems like fluentd and logstash are agents that collect and upload logs; elasticsearch or databases (MySQL, Mongo, etc.) store them. Hosted systems include Google Cloud Logging. Apache Flume3 can collect logs for processing in Hadoop. Google BigQuery and Google Cloud Dataflow, too.
Deep Inspection & Tracing: Dapper is used inside Google, Appdash and Zipkin are open source and inspired by Dapper and Sysdig is another tool.
As far as where the trends of where the modern stack is going, I think what folks have long considered platform as a service (PaaS) is being reinvented as a set of composable systems and services that combines ease of use with depth to enable new architectures. Pretty much of all of the pieces listed in that post are included as part of a modern PaaS, whether it be App Engine or Heroku or Cloud Foundry. Moving forward, more PaaS offerings will leverage this toolset to provide a large range of levels that developers can program to. The end result insulates people from the infrastructure and reduces the operational work, the amount of time that people need to devote to the care and feeding of the applications. You achieve a reduced operational burden by automating everything and providing insight and debuggabilty such that you are no longer dealing with machines but instead you are dealing with higher level logical constructs.
So the story up until now is that you either build at the infrastructure level, which gives you ultimate flexibility but also means you have a gun pointed at your foot, or you do it at The Next Platform level with a relatively fixed architecture and opinionated programming mode. You trade in some of that flexibility at the infrastructure level in order to have a smoother developer experience and a reduced operational burden. IaaS versus PaaS has been seen by many as a binary choice. However, my experience at Google has been that you can build higher-order platforms that are just as flexible – if not more flexible – than the underlying infrastructure layers yet still have the ease of use and operations users expect today with a modern PaaS.
Over time we will probably see companies that will take all of these pieces – and I think OpenShift is a good example of this – and package them up for you, but with the understanding that if one of the pieces of that stack is not really working well for you or somebody innovates and does something interesting in one of those layers of the stack, you can adapt and, say, swap in a different solution for monitoring. What that means is that the bar in terms of the amount of code you need to write to introduce a new PaaS platform is going to be reduced dramatically. While constructing, proving, and supporting an integrated set of components isn’t for everyone, there are opportunities for new providers to create specialized PaaS-like systems for specific vertical markets. Eventually, companies can share the base of their stack across multiple projects and applications with different frameworks and higher level tools used as appropriate.
When we were building Google Compute Engine, I viewed virtual machines on GCE as a transitional technology. VMs will always be around but there is a better world out there if we get past the local maxima that we are at with the current infrastructure offerings. Google has found one way to move past VMs with systems such as Borg and related technologies such as Chubby, GFS/Colossus, and Stubby/gRPC. When developing GCE, one of my bosses brought up a VM for the first time and was presented with a prompt. His reaction was one of “Now what?” He, and many others at Google, are so used to using higher level deployment systems that a prompt like this in a production environment felt very, very primitive.
Kuberenetes was released as a way to get these ideas out of Google in a more practical way than an an academic paper. But Kubernetes is just the start – it is a piece of a larger platform. We really struggled with finding the right piece to carve off and release without shipping the entire platform. So we started with a Borg equivalent. But once you have Borg, you need an identity system so you can identify different parts of it, and then you need monitoring for the stuff you run on top of it and then you need a config language so you can talk about more complex configurations. We are just at the beginning of pulling together all of these pieces in the open. Kubernetes is a key part of this platform, but it is definitely not the end of the story.
Lots of folks from inside of Google and others on the outside are working hard on Kubernetes. While many of the initial concepts were based on Google’s internal experience, there is a realization that Google does not have all of the answers here. Things that apply inside of Google do not always apply outside of Google. Building the Kubernetes community is a really big part of what we are trying to do.
While Kubernetes is still a relatively young project, it has reached v1 and is usable in its current form. Our initial thrust for Kubernetes is to make sure that we are capturing workloads using the right vocabulary and concepts. Once we have the right interfaces in place, then the rest of it is just writing code.
Kubernetes was recently donated to the Cloud Native Computing Foundation. As such, it is literally owned by the community. I’m hoping over time this community (and perhaps the CNCF) will expand to help define the rest of this platform. If you would like to get involved, please start by running Kubernetes or attending the upcoming community-led KubeCon in San Francisco from November 9 to 11.
Joe Beda is an independent software engineer in Seattle. Over his career at Google and Microsoft, he has built browsers, designed graphics APIs, connected to the telephone system, and optimized ads. Over the past five years, he started and launched Google Compute Engine and Kubernetes. Beda holds a B.S. from Harvey Mudd College in Claremont, California. He is @jbeda on Twitter.