There are two competing trends in platform designs that architects always have to contend with. They can build a platform that performs a specific function and does it well, or create a more generic platform that sacrifices some efficiency but does a lot of jobs well. Sometimes you try to shoot the gap between these two poles.
That is precisely what Arista Networks, the networking upstart that has serial entrepreneur Andy Bechtolsheim as its chief development officer, is doing with a new line of what it is calling “universal leaf” switches. The leaf switches (does one say “leafs” or “leaves” or just write around it and always use it as an adjective?) complement a line of “universal spine” devices that marry high-end switching and routing together than Arista introduced last fall.
With fixed function switches and even more modern leaf switches, they are typically designed for very precise roles in the network. This is good when your network is predictable and fairly static in the roles different, or when the switch has a very precise role as is the case with top-of-rack switches that are primarily used to act as a go-between linking servers to each other within a rack and as a hop in linking servers to machinery in other racks. Such leaf switches are built with minimalist functions – only what you need – and low cost – a consequence of the limited function. Still, some leaf switches have other roles added into them, such as tapping into links for monitoring or doing routing; others are designed for very low latency and employed in networks where speed is more important than bandwidth.
“The net result,” says Jeff Raymond, vice president of EOS products and services at Arista, “is that traditional lead switches are designed to address one primary function and usually not with a lot of broader features and certainly not with a more robust architecture.”
The diversity in the network might look something like this:
The upshot is something that has been driving open networking in general and network function virtualization specifically, and that is companies want more flexible network gear that is more of a generic platform than a device tuned to do one or a few things well – that is more versatile and programmable like an X86 server is. This way they can validate a single platform once and put it into the network for various roles concurrently or over time.
“Merchant silicon has become more capable over time, and with that, we are able to build products that have the capabilities of running across these use cases,” Raymond tells The Next Platform. “They have a superset of functionality, and they can be one leaf platform that can be span all of these workloads. This reduces the operational cost of networking because there are fewer product qualifications and fewer code scrubs.”
Such universal leaf switches, as Arista is calling them, are not for all use cases. High frequency traders that need the absolute lowest latency are not going to use this gear, and neither are HPC centers that have switches at the heart of their clusters. The question is how much of a premium will customers pay for a universal leaf to get those operational benefits?
“Up front, you may – and I am not necessarily saying that you will – but you may end up paying more for a universal leaf than for the specific fixed function switches, but the benefit is the consistency of operations and broader applicability,” says Martin Hull senior director of product management at Arista. “Also, if you are an enterprise that is not able to completely control all of your applications, you have general purpose workloads, and you need to make sure you need to have the most robust capability to accommodate whatever might come at you in the upcoming years. So there is an argument that you might want to have something that is universal rather than compromising on capabilities. We will continue to offer our fixed function products, whether they are based on Broadcom “Trident” or “Tomahawk” chips or Intel “Alta” chips, but this is an example of stretching across many use cases for many customers.”
To make the universal leaf, which is known as the 7280R series, Arista is tapping Broadcom’s “Jericho” ASIC, which is the switch chip from Broadcom announced in March 2015 that is part of the “Dune” family and that has deep packet buffers, large routing tables, and other high-end features. The Jericho ASIC is also used in Arista’s “universal spine” 7500R series modular switch, which debuted in March this year and which merges big switching and big routing capabilities into a single box. (In a sense, the 7280R is just a scaled down, fixed format version of the 7500R modular box.) The 7280R, thanks to the ability to store over 1 million routes in its routing tables, can be used as an edge router linking out to the full Internet, just like the 7500R universal spine can hold all of the Internet in its routing tables. (The Internet has about 600,000 routes at the moment, and growing, according to Hull.)
An edge router is a very pricey box indeed, often costing anywhere from $100,000 to $200,000 per 100 Gb/sec port, depending on features in the router and not including optical cables that are also terribly expensive. Moreover, these routers might only be able to cram 80 ports into a half rack or full rack of space. The 7500R universal spine and 7280R universal leaf switches cost on the order of $3,000 per 100 Gb/sec port, and they are considerably denser and less expensive.
As we pointed out last September, a 100 Gb/sec port on a fixed function, fixed port 7060CX switch from Arista based on Broadcom’s “Tomahawk” ASIC costs on the order of $1,000 per 100 Gb/sec port in a system with relatively modest packets and not much in the way of routing capability. A double-decker 7060CX switch using the Tomahawk chips costs around $2,000 per port. So the premium for a universal spine or universal leaf at $3,000 per port is anywhere from 3X to 50 percent, depending on how you want to think about it.
As far as Arista is concerned, the Jericho chips are best suited not only for converged switching and routing, but also for distributed storage that is using Ethernet as a backbone and as a means of linking storage to servers (as opposed to Fibre Channel switches from storage area networks that have their own interconnects or that rely on InfiniBand). Such cloudy object storage is an order of magnitude cheaper than traditional enterprise SANs (call it $5 per GB for SANs compared to around 36 cents per GB for cloud storage with some network usage), but if you just use any old Ethernet switch to build the distributed storage, you will run into bottlenecks, says Hull. Add in flash, which is maybe 20X faster than spinning disk, and you can create some data storms on the Ethernet backbone for storage clusters and deep buffering is what can help handle the congestion. You have to pay more for that because Jericho is a more expensive ASIC made at a lower volume than either the Trident-II+ or Tomahawk ASICs.
And customers will pay more to get higher performance – and more predictable performance. Hull says that one financial services customer has created a block storage cluster based on EMC’s ScaleIO storage software that spans 10 PB of capacity and uses Dune-based 7280E switches to link the nodes together and will be upgrading to 7280R switches in the future to get more bandwidth and deeper packets. Online music streaming service Spotify was also a user of the 7280E switches and is moving to the 7280R switches to get rid of some of its routers as well as to boost the performance of its network to 100 Gb/sec.
There are four models of the 7280R switches, as you can see above, with varying port counts for downlinks and uplinks running at varying speeds as well as buffers of different sizes. The port speeds and number are designed to accommodate the storage and routing workloads that the 7280R switches are aimed at, which is why there is what might look like a peculiar number of ports running at different speeds. You will note that there is not a switch in the 7280R family that just offers 100 Gb/sec uplinks and downlinks, and the top-end machine, with only eight 40 Gb/sec ports and 48 100 Gb/sec ports looks like its “uplinks” are slower than its “downlinks” given the usual practice of having fewer ports running at higher speeds and calling them uplinks. The three other switches look more typical of fixed port switches.
The 7280R switches are available now. No word on when Broadcom will upgrade the Dune family with faster and deeper ASICs, but that is probably on the horizon, particularly if the universal spine and leaf devices made by Arista take off in the market. It will be interesting to see if the hyperscalers start using the Dune chips, too.
Be the first to comment