Inside eBay’s Shift To Kubernetes And Containers Atop OpenStack
November 12, 2015 Timothy Prickett Morgan
Online auctioneer eBay was founded a year after Amazon and is older than Google, Facebook, and a number of the other hyperscalers. Like its peers, the large number of users on its service – it has 159 million active buyers in 190 countries and over 800 million product listings on its service – and the infrastructure scale that is required to support them forces eBay to embrace new technologies ahead of more conservative and smaller enterprises.
Three years ago, eBay was an early and enthusiastic user of OpenStack for managing its cloudy infrastructure, after having shifted its fleet of virtualized machines from VMware’s ESXi hypervisor to KVM running on OpenStack. (eBay uses its own implementation of OpenStack, with lots of customization.) The company started out with around 300 servers running the “Essex” release of OpenStack back then, and fast forward three years and eBay and PayPal combined were running on the “Havana” release with over 300,000 cores supporting more than 12,000 KVM hypervisors with Open vSwitch virtual switches across more than ten availability zones and configured as more than 15 virtual private clouds for the company’s various business units.
Now, eBay is getting out on the front end of the Kubernetes container scheduling system that has been open sourced by Google and that eBay plans to interleave with its OpenStack clouds to manage containerized applications.
While eBay is one of the top retailer brands in the world, its scale does not quite compare to that of the truly humungous hyperscalers like Google, Amazon, Microsoft, and Facebook. But that said, its infrastructure is quite large, and the company takes infrastructure very seriously, including the use of containerized datacenters, powered by fuel cells, that employ hyperscsale-class machines from Hewlett-Packard and Dell that often have customized Intel Xeon processors. (eBay likes to run its processors a little hot to get more performance per system, we have heard, but the company is a bit secretive about precisely what it does.)
Ashwin Raveendran, senior member of the cloud technical services team at eBay, spoke at the KubeCon 2015 conference this week about how the company was looking to augment its OpenStack cloud with the Kubernetes container scheduler, one of the first public examples of a hyperscaler committing to the mix of OpenStack and Kubernetes. Raveendran said that eBay, which split apart from the PayPal payment service business in July, still has grown its infrastructure and now has more than 500,000 cores spread across more than 150,000 servers. That gives eBay a server fleet that is about the same size as those of the Rackspace Hosting and SoftLayer public clouds, just to give you a sense of perspective.
A typical availability zone at eBay has between 5,000 and 20,000 servers, and servers tend to be podded up in 500-node chunks. Interestingly, and according to some data that eBay has shown in the past, its core count has been growing linearly but the number of VMs and projects deployed on its infrastructure (before it spun off PayPal) went exponential in 2013. As for storage, eBay has over 200 PB of capacity, which runs on a sizeable chunk of that server farm, we presume. As of the spring, about 1.6 PB of that was Cinder block storage. About 120 PB of that storage is used to support Hadoop, making it one of the largest analytics setups in the world.
The data and infrastructure services unit that Raveendran works for at eBay pulls 2 million metrics per second out of its application and infrastructure monitoring system and generates more than 300 TB of log files per day as it processes billion of queries and serves out more than 20 billion images per day for the auctioning and retail applications that essentially are eBay. It is clearly a big and complex operation, and we are oversimplifying a bit.
The platform stack at eBay will look familiar, conceptually, to that of many modern corporations:
OpenStack sits between the servers and storage ad abstracts the compute, storage, and networking services on which the eBay platform services all depend. At the moment, eBay’s current system mandates that every application instance runs in its own dedicated virtual machine. The adoption of Kubernetes at eBay is not just about moving to containers to deploy applications, but changing the application lifecycle at the company, which is centered around the infrastructure cloud layer (with provisioning, deploying, monitoring, and remediating issues being the key functions for developers and system administrators to perform). eBay plans to go to a more flexible deployment model using containers as its runtime and Kubernetes on top of OpenStack to manage those containers. The way Kubernetes will be used by eBay will be inspired by its current homegrown techniques for deploying applications inside of virtual machines.
Here is what the current setup looks like:
At the moment, most of eBay’s applications are coded in Java. And like many commercial clusters that are based on virtual machines, the ones at eBay are scheduled in a static fashion. What this means, practically speaking, is that server nodes in the infrastructure cloud are allocated for each kind of workload. Like this:
Static scheduling of nodes on the clusters is obviously not an ideal situation, but lots of companies that have big spikes in usage for portions of their applications statically allocate capacity on their private clouds just the same. “This is a huge problem in a private cloud where the services are pointing, and we cannot burst out to a public cloud when we need capacity,” explains Raveendran. “We have to maximize the resources that we have.”
(Unless you create your own public cloud like Amazon has, of course, but we digress.)
But one of the reasons why technologies such as Kubernetes and Mesosphere are getting a lot of interest is that they impose a lot less overhead on servers when multiple workloads are shared on a single machine and, perhaps more importantly, containers with scale out workloads can be fired up and retired at a much faster pace than traditional server virtualization. What is old – software containers – is new again, and a lot more palatable now that enterprises have spent a decade virtualizing their servers with fairly heavy technologies. They are now learning when the much more rugged isolation of virtual machines is not necessary and when containers will suffice.
Given its embracing of containers – and its need to also continue to support virtual machines in its infrastructure clouds – eBay plans to make use of the Magnum plug-in for OpenStack, which The Next Platform detailed back in May and which is seeing a bit of a resurgence as it is reconfigured to hook Docker Swarm and Kubernetes container schedulers into OpenStack.
As you can see, eBay will be replacing its own homegrown host agent with a kubelet host agent from Kubernetes woven into a Docker container runtime. Applications will be downloaded from a container repository and deployed onto servers in a Kubernetes pod where they can be managed collectively if necessary. The setup will also get rid of the statically configured load balancers and firewalls in the current eBay infrastructure, which Raveendran says was a big bottleneck in the infrastructure operations, too. When it is all done, this is what the dynamic scheduling will look like:
The things that make Kubernetes appealing to eBay, explains Raveendran, is that it is open source software that many people are collaborating on and that eBay can contribute to and benefit from. Perhaps more importantly, Kubernetes is available as a controller layer on several public clouds running Docker containers, which would give eBay the capability to burst out to a public cloud should the need arise and the workload be shiftable from eBay’s distributed private cloud to a public cloud.
The move to Docker containers and Kubernetes to control them is not without its challenges, says Raveendran. As many people have been grousing for years, networking is a challenge for OpenStack, and in particular the fact that OpenStack does not have a virtual router like Amazon Web Services and Google Compute Engine is a problem. But here is how eBay is thinking about solving it:
The idea is to use the Neutron networking plug-in for OpenStack to create a Kubernetes router. Over the long haul, Raveendran says that eBay may just take the Border Gateway Protocol (BGP) that is used for Layer 3 networking all the way down to the server host and not just stop it at the top of rack (TOR) switch in the infrastructure. This is something that eBay’s hyperscale peers already do to simplify their own networks, which have to span as much as 100,000 machines (sometimes more) in a single network.
The other issue that eBay is worried about is scale, of course, and it is working with the Kubernetes community to scale out the performance of Kubernetes, which had some self-imposed limitations in its early release as Google and its community partners sought to get the semantics and the foundation right for mixed workload and hardware environments that are not the norm inside of Google. eBay is also participating in the Ubernetes effort to federate Kubernetes clusters and scale them across datacenter regions or across a mix of public clouds and private clusters that use Kubernetes to abstract the container level. eBay is also looking at ways of integrating its Cinder block storage with its Kubernetes container pods.