Bringing Hyperscale SDN Lessons Down To Enterprises
June 23, 2015 Timothy Prickett Morgan
Everything seems to be easier for the hyperscalers than it is for large enterprises, but it is just an illusion created by the confluence of some of the best minds in IT with a whole lot of money. With those two as fuel, it is no wonder that hyperscalers are so advanced, but rather that they are not even further ahead of enterprises.
Arista Networks, the upstart switch vendor that has serial entrepreneur Andy Bechtolsheim as its CTO and that six years ago started pushing the envelope on merchant silicon to commercialize 10 Gb/sec Ethernet in the datacenter and to bring network function virtualization to its Linux-based EOS network operating system, has sold plenty of switches into hyperscalers and service providers as well as a bunch of enterprises and more than a few HPC sites. The company is familiar with the homemade stacks of software that many of them have cobbled together to create what has become known as software-defined networking – we just got a peek at the insides of Google’s homemade network and its software last week. And Arista knows that most of its customers will never have the engineering talent, time, or money to create such SDN stuff on their own.
“There are many companies that have Google envy,” Jeff Raymond, vice president of EOS product management and services at Arista, tells The Next Platform, for the obvious reasons that it takes far fewer people to run the server, storage, and switching infrastructure at Google, as gauged on a per-device basis, than it does at your average large enterprise, government agency, public institution, or supercomputing center.
The typical hyperscale company – what Arista has started calling a “cloud titan” –has a do-it-yourself approach to the hardware and software design and saves money by producing equipment that precisely fits its workload and embeds as much automation and predictive analytics into their device control systems as possible. The net result for networking is that a single engineer at one of these hyperscalers can manage around 10,000 network elements. (That might be an entire datacenter, just so you get the sense of it.) By contrast, high-tech companies, service providers, and enterprises that operate at large scale – but well below hyperscale – have started to move into DevOps practices and automate configuration and management with tools such as Chef and Puppet and do a fair amount of integration and customization through writing their own scripts. That allows their engineers to support maybe somewhere around 1,000 devices each, says Raymond. What that means is that the enterprise might have an order of magnitude fewer devices, but it has the same level human support for the network. If you move down into traditional enterprises and service providers, a lot of the network management function is done by hand with lots of scripting, and an engineer may be only handling 100 devices, but the ratio of devices per engineer is 100X lower than at the hyperscalers.
With the CloudVision tool that Arista has put together, the company has taken some of the lessons it has learned from its hyperscale customers and created its own tool to span all generations of Arista switches and provide a single point of integration between SDN controllers and other orchestration services that need to interface with those switches. “We run in those hyperscale environments, and one of the reasons why customers up there choose us is because EOS on our switches is so programmable,” says Raymond.
Here’s how CloudVision works. Inside every Arista switch is a copy of the EOS operating system and a database that knows everything about the switch – its state, in the parlance of system engineering – including the forwarding tables, temperature sensor readings, drivers running to link services to the internal ASICs in the switch. Normally, to get such information out of a switch, you have to do SNMP polling of each device, and you could have an SDN or overlay controller do that. This gets cumbersome very fast.
So Arista has essentially parallelized the SysDB database and abstracted it up to have information concerning an entire network of switches, all at the same time. To do this, Arista has put SysDB in a virtual machine running EOS and provided it with a Hadoop backend that maintains all information on the current state of all Arista switches in the network. By having all of this information about the network in one place, says Raymond, companies can do a better job automating the provisioning their networks, balancing the workload across those networks.
Importantly, those SDN controllers and infrastructure orchestration tools talk down to the network through a single line of communication, through CloudVision, instead of having to talk to hundreds or thousands of devices at the same time, which ends up making the network run more smoothly and putting less stress on those orchestration and controller tools. Rather than polling using SNMP for each device, the parallel SysDB tool at the heart of CloudVision can stream data to cloud controllers like OpenStack or network overlays like those available from VMware (NSX), Nuage Networks (Virtualized Services Platform) and soon Microsoft with the embedded network controller that is coming with Windows Server 2016.
All of these tools, and many others, support the Open vSwitch Data Base (OVSDB) protocol, which stores information about the state of devices in JSON files. (It is interesting to note that the Application Centric Infrastructure SDN hardware and software from Cisco Systems does not use OVSDB, but that it is a story for another day.) If you move a virtual machine across the VXLAN fabric, it the VM has to move from one virtual switch to another on the hypervisors on the servers and the MAC address for the virtual network interface has to move, too, and this information has to be populated in the forwarding tables and passed up to the SDN and cloud controllers.
By having a single control point, OVSDB information can be passed from switches to controllers about an order of magnitude faster, which stands to reason because it is being streamed after being pushed to CloudVision from the switches rather than polled individually from switches. In the example above, Arista was measuring the rate at which MAC addresses for virtual machines can be moved across the network. This means you can do more flitting of VMs around the datacenter within a given time. This isn’t so much about speeding up the controllers but rather simplifying the integration between the switches and controllers, which gives the benefit of speed.
The CloudVision software is useful in other ways, too. For one thing, the network overlays mentioned above, which use the VXLAN or NVGRE protocols now embedded in switch ASICs to stretch Layer 2 networks over datacenter-scale Layer 3 fabrics, can now be mapped to the underlying physical network (often called the network underlay) and fabric controllers in the network.
Arista is not providing a lot of detail on the Hadoop backend for CloudVision, but it seems like it has a real-time component (perhaps based on one or another in-memory add-ons to Hadoop) as well as historical data about the state of the networks. What we can tell you is that CloudVision can not only control the rollout of the upgrade of an entire network of Arista switches, but because it keeps a time series archive of the state of the network, it can automate the rollback of an entire network if something goes wrong with a network upgrade.
While CloudVision is designed to manage thousands of switches in a single instance, Raymond says that the typical enterprise, service provider, or high tech company that it is targeting with its turnkey SDN software tends to install networks and their servers and storage in pods of thousands of machines, which means they have hundreds of switches in the racks typically. Not that Arista doesn’t have some large customers. Its largest network has over 200,000 servers in it, and among its more than 3,000 customers, seven of the eight largest hyperscalers are using Arista products. (Google is, of course, the outlier there that does not use Arista switches, and Microsoft and Facebook have said publicly that they use Arista gear, although Facebook is in the process of building its own switches and network operating system.)
The CloudVision software has been in development for a while, and has several dozen customers who are in various stages of early deployment or proofs of concept rollouts. CloudVision installs locally and runs inside of the corporate firewall; it costs $295 per month per network device. Volume site licenses and enterprise-wide licenses that span multiple datacenters are available to bring that cost down.