Arista Gives Tomahawk 25G Ethernet Some XPliant Competition
December 13, 2016 Timothy Prickett Morgan
Processing for server compute has gotten more general purpose for the past two decades and is seeing resurgence in built-for-purpose chips. Network equipment makers have made their own specialized chips as well as buying merchant chips of varying kinds to meet very specific switching and routing needs.
Of the chip upstarts that are competing against industry juggernaut Cisco Systems, Arista Networks stands out as the company that decided from its founding in 2009 to rely only on merchant silicon for switches and to differentiate on speed to market and software functionality and commonality across many different switch ASICs with its Linux-based Extensible Operating System, or EOS. And that approach has made Arista, which has Sun Microsystems co-founder Andy Bechtolsheim as its chief technology officer, a very successful company and a staunch competitor in the high-end datacenter market where it has chosen to participate.
Arista is the first company that has publicly announced support for the XP80 family of switch chips, created by XPliant and acquired soon after its launch from stealth two years ago by ARM server chip upstart Cavium, in its switch products. Arista launched its first series of switches to chase the 100 Gb/sec switch arena based on the “Tomahawk” line of ASICs from Cavium’s larger rival, Broadcom, back in September 2015, called the 7060 and 7260. The company started out using ASICs from Fulcrum Microsystems, which Intel bought back in 2013, as well as both the StrataDNX and StrataXGS lines of chips from Broadcom, which represent the bulk of the chips by count that Arista supports.
The support of the Cavium XP80 chips in the new 7160 line is important for a few reasons. First, Arista is embroiled in a patent lawsuit with Cisco Systems regarding functionality it has in its Broadcom-based switches, and the XPliant chips allow Arista to sell a switch that adheres to the 25G Ethernet standard without possibly infringing on any patents that are in dispute. It is not clear if Arista intended to use XPliant ASICs from the get-go – as far as we know, only one other switch maker has – but its history suggests that Arista will roll out switches based on a variety of merchant silicon, and indeed, if Intel had its 25G Ethernet act together (which it does not), then you could rest assured that it would have support for the kickers to the Bali and Alta chipsets, too.
The 7160 line of switches from Arista have three different variants, which all start shipping in the first quarter of next year. The 7160-32CQ has 32 ports running at 100 Gb/sec speeds that support the IEEE variant of the 25G Ethernet standard; you can, of course, use cable splitters to make links running at 50 Gb/sec or 25 Gb/sec with 64 ports or 128 ports, respectively. The 7160-48YC6 has, as the name suggests, 48 downlink ports running at 25 Gb/sec and six uplink ports running at 100 Gb/sec. If you want, you could put splitters on the 100 Gb/sec ports and create a switch with a total of 72 ports running at 25 Gb/sec. The 7160-48TC6 is also based on the XP80 chip and has 48 ports running at 10 Gb/sec (10BaseT ports) and six ports running at 100 Gb/sec.
The XP80 ASIC can handle 48,000 access control list profiles, which is four to six times as many as available on other switch ASICs that provide 25G Ethernet support. It has support for VXLAN tunneling for Layer 2 and Layer 3 networks, which is important for megascale and hyperscale datacenters that use this protocol (championed by VMware) for virtual networking across datacenter-scale infrastructure.
The top-end XPliant ASIC used in the 7160-32CQ can deliver up to 6.4 TB/sec of aggregate switching bandwidth and process 1.2 billion packets per second. The one used in the 7160-48YC6 can deliver 3.6 Tb/sec of bandwidth, and the one in the 7160-48TC6 does 2.16 Tb/sec. The latency of a port-to-port hop in the 7160 series of switches is not InfiniBand class at 2 microseconds to 3 microseconds, but it is sufficiently low for a lot of workloads. (100 Gb/sec InfiniBand is more like 100 nanoseconds, just to give you an idea.) The switches have support for SRAM buffers, which include 24 MB of capacity, and a feature unique to the Cavium chips called AlgoMatch that replaces the very expensive TCAM memory that has been typically used in switches for access control lists; this approach is somewhere around 50 percent more power efficient and delivers twice as many rules for IPv4 networks and four times the capacity for IPv6 networks.
The most important thing about the XPliant chips is that they have a programmable pipeline, allowing for new protocols to be added to the switch in microcode rather than etched in transistors. This means two things. First, new protocols can be put on new switches almost instantly, and even tested before standards are adopted, and second, machines in the field can be upgraded to add new protocols. So, for instance, the support for VXLAN and its analog from Microsoft, called NVGRE, for virtual network overlays was done using this programmable pipeline. It might take a few months to add such support to an XP80 ASIC, compared to the two years or so it takes to etch a new or updated ASIC to add protocols. Perhaps equally importantly, protocols that are not needed can be removed from the ASIC if they are not useful.
“Within the chip itself, there is a degree of flexibility, and until we try to do some of these things, it will be difficult to characterize what the limits are to the programmable pipeline,” Martin Hull, director of product management at Arista, tells The Next Platform. “But the chip is designed to have multiple lookup stages and we have control over the operations that happen at those stages and depending on how we choose to implement that chip it could be more than half dozen different choices. I don’t want to give out too many specifics because it gives competitors clues about our implementation.”
Ultimately, what matters more is that companies will be able to move from 10 Gb/sec ports on their servers and downlinks to 25 Gb/sec ports with the same copper or fiber cables, which are expensive to upgrade and a big hassle to change, too. (It is a wonder that more companies, not just hyperscalers and cloud builders, are not moving to 25G ports, and maybe they will with the future “Skylake” Xeon processors coming from Intel next summer.
As you can see, a 25 Gb/sec port on an Arista switch does not cost much more than a 10 Gb/sec port these days – about $350 per port if you do it on the switch that uses splitters on the 100 Gb/sec uplinks to make 72 ports running at 25 Gb/sec. And the cost of a 100 Gb/sec port is under $1,000 on the switches using the XPliant ASICs, or about the same as a 100 Gb/sec port based on the Broadcom Tomahawks, according to Hull. But on the 25 Gb/sec setup, the cost per unit of bandwidth sure has gone down – by a little less than half, if you do the math, with these XP80 chips. It doesn’t look like Arista is pricing the 7060X series switches based on the Broadcom Tomahawk chips much differently than the 7160 series based on the Cavium XP80 chips, by the way. There is a chance, however, that there is more wiggle room for discounting in one chip or another, we simply have no way of knowing this.
What Arista wants is to get customers on the 100 Gb/sec bandwagon, even as some customers are looking for deals on 40 Gb/sec gear. Link aggregation, or LAG, is a technology that was invented as a kind of clustering for switch ports that can make two 40 Gb/sec ports look like one giant 80 Gb/sec port to machines talking to the network and to the switch operating system itself. But the hashing algorithms that are used to aggregate those two ports are not efficient, so the effective bandwidth is a lot lower than 80 Gb/sec and certainly a lot lower than an actual 100 Gb/sec link. (We might argue to do LAG on a pair of 100 Gb/sec ports, but that is just getting greedy.) In any event, the 100 Gb/sec transition is beginning in earnest.
“I can happily and enthusiastically talk to anyone about the transition that is happening from 40G to 100G, but it is not free,” says Hull. “So there is a still a large deployment of 40G that is not going to go away in a hurry, so we continue to add new features and functions. We see that 40G is still a large slice of the market, and the various market forecasters have agreed that 40G has reached its peak, but it does not go to zero instantaneously. 40G is going to transfer to 100G in most areas, and in some it might fall back to 25G, and in other areas 10G is going to move up to 25G.”
From Gigabits To Gigabucks
Arista was one of the innovators to bring corporations, particularly hyperscalers and cloud builders, from Gigabit Ethernet to 10 Gb/sec and then 40 Gb/sec switching, and it is doing its best to get even more share of the market for 25 Gb/sec and 100 Gb/sec devices.
If the final quarter of the year finishes off as we expect, then Arista will break through $1.1 billion in sales and probably bring in something on the order of $180 million to its bottom line. Arista has been found to not have infringed in four of the six patents that Cisco has sued it over, but Cisco has prevailed on two of the patents with the US International Trade Commission and is working on securing an import ban. None of this seems to be slowing down its sales much, and the company has $800 million in cash and securities to tide it over if things get tight. Wall Street seems mostly unfazed by all of this, with Arista now having a $6.5 billion market capitalization. The datacenter switching market that was $6 billion in 2013 is now expected to double to $12 billion in 2017, growing at a compound annual growth rate of 19 percent over that term. Arista had carved out a 3.4 percent share of the ports running at 10 Gb/sec or faster in 2011, 4.9 percent in 2012, 6.7 percent in 2013, 9.3 percent in 2014, 12 percent in 2015 and is projected to hit about 15 percent in 2016. Cisco’s share of the high end of datacenter switching port counts has dropped from 73.3 percent to 53.1 percent over the same time.
Cisco doesn’t play in markets where it doesn’t have 65 percent share, or at least that was the old saw before reality sunk in and competition came to its networking space as it entered the server space in 2009, just as Arista was getting rolling.