It looks like networking price war is getting set to break out in the largest datacenters of the world, and this is precisely what Google and Microsoft had in mind when they formed the 25G Ethernet consortium back in July 2014.
Unhappy with the bandwidth, cost per bit, and high power consumption and heat dissipation of existing 40 Gb/sec and 100 Gb/sec switches, the two companies teamed up with switch chip makers Broadcom and Mellanox Technologies and switch maker Arista Networks to create a new standard that would offer 25 Gb/sec and 50 Gb/sec switching inside racks and feeding up to 100 Gb/sec in the aggregation and spine layers of the network. The IEEE originally rejected this idea, but came around to endorsing it once it became clear that these companies were happy to create their own standard with or without the IEEE’s blessing.
Now, after much work, the first set of 25G products are getting ready to come to market, and shake things up they will for sure. And we believe, not just for the Googles and Microsofts of the world.
Dell, which sells a slew of custom servers to hyperscalers and cloud builders and is seeking to expand that business to smaller service providers and telcos, was the first to preview the switches it was creating using Broadcom’s “Tomahawk” switch ASICs, which support the 25G protocols and which were previewed last summer. HP has hinted at its plans for the “Tomahawk” ASICs from Broadcom, but has yet to launch products, and Mellanox, which makes switch chips as well as switches, adapters, and cables, unveiled its Spectrum switches and ConnectX-4 adapters back in June, which support 25 Gb/sec, 50 Gb/sec, and 100 Gb/sec speeds. Earlier this year, Dell was talking about Force10 S series switches offering 100 Gb/sec switches based on Tomahawk for under $2,000 per port. The top-end Mellanox Spectrum SN2700 switch has a list price of $49,000 back at the June launch, which works out to $1,531 per port. (Other Mellanox switches had not been priced at that time, so we are unsure of how to compare them.) And in anticipation of the price war, Mellanox is now selling a bundle of the SN2700 with four dual-port 100 Gb/sec ConnectX-4 server adapters and eight copper cables for $18,895. That is like paying $590 per port and getting some the adapters and cables for free.
Now Arista Networks, another founding member of the 25G consortium, is gearing up to push its own products based on the Tomahawk ASICs, and Martin Hull, director of product management at Arista, tells The Next Platform that for its 32-port, 1U fixed-port switch, it will charge under $1,000 per port for the device. (However, larger 2U fixed-port switches and modular switches deploying 100 Gb/sec ports will cost around $2,000 per port.)
This aggressive pricing, as well as the fact that Andy Bechtolsheim was its chief technology officer, is what made Arista an immediate player in datacenter networking when it launched in 2008, right in the belly of the Great Recession, particularly among cloud builders and hyperscalers who do not want to pay a Cisco Systems premium for their networking and who also want a Linux-derived network operating system on their switches that they can hack a bit.
Arista started out selling 10 Gb/sec switches based on Intel’s Fulcrum family and Broadcom’s Trident and Dune families of ASICs, and Hull says that the company is keeping an open mind on what ASICs to use in various kinds of devices going forward. Cavium Networks has its own XPliant ASICs, which support the 25G standards, which Arista did not use in the new products. No one is quite sure what Intel plans to do in this arena just yet but presumably the company has something in the works to appeal to cloud builders and hyperscalers, most of whom who are not going to deploy its InfiniBand follow-on, Omni-Path, except in niche cases.
It is easy to understand why the cloud builders and hyperscalers want server ports and switches based on 25 Gb/sec lanes – they consume less power per bit than a 10 Gb/sec lane does, and the cost of that incremental bandwidth does not go up by a factor of 2.5X, but something around 1.5X or lower. Perhaps, over time, a lot lower, particularly with volume discounts. And switch makers and server adapter makers are going to make it up in volume. It takes hundreds of switches to lash together tens of thousands of servers, and the cloud (which includes hyperscalers in the lingo most people use) is growing much faster than the traditional enterprise datacenters.
If you look at workloads instead of ports, as the analysts at Credit Suisse did, then cloud datacenters are going to see a 30 percent compound annual growth rate in workloads between 2012 and 2017, rising from 34 million workloads in at the start of the forecast period to 120 million by the end of the period. By contrast, traditional datacenters will see more modest growth, from around 50 million workloads in 2012 to 70 million five years later. It is no wonder that so many server, storage, and switch vendors are chasing the cloud market. (We presume that this data includes public clouds, hyperscalers, and private clouds.)
To date, Arista has shipped 5 million ports of networking capacity into cloud builders and hyperscalers, says Hull. It took six years to do the first 2 million of those ports, and only 18 months to do the next 3 million ports, so its cloud business is definitely accelerating if port count is a measure.
While Arista is nowhere near the size of Cisco Systems, the juggernaut of the switching arena, and its revenues are not growing as fast as port count, there is every reason to believe that if current trends persist that Arista will kiss $900 million in revenues for this year and certainly go well over $1 billion in 2016, particularly if the hyperscalers and cloud builders adopt 25G products with the kind of enthusiasm we expect. The company currently has $588 million in cash and equivalents, some of which came from its initial public offering in June 2014, and that gives it maneuvering room despite a patent lawsuit with Cisco and another suit brought by one of the company’s founders just before it went public. Arista has a market capitalization of $4.2 billion as we go to press.
Roughly speaking, Hull says Arista generates something on the order of a third of its business from these customers. Another third comes from service providers and telcos and the remaining third comes from financial services firms, large enterprises, and HPC centers. The company’s revenue pattern very strongly resembles that of Intel’s Data Center Group, rising through the year and then dipping in the first quarter from the prior year’s fourth quarter to grow from a new floor.
The Cloud Is Driving Network Innovation
The thing that works in favor of Arista in the hyperscale, cloud, and service provider market is that these companies like to dual source their switching, just like they typically do with server OEMs or ODMs to mitigate risks. (This also works in favor of any staunch competitor to Cisco.) Arista switches are currently used in six of the seven largest hyperscalers and clouds, says Hull, and there is no reason to believe that so long as companies are buying switches rather than designing them, as Facebook is doing and as Google and Amazon have done for years, they will not continue their dual vendor strategies.
While the 25G effort was focused predominantly on the networking needs for hyperscale datacenter operators and cloud builders, there will be a trickle-down effect as there always will be with any hot new technology that pushes the boundaries of scale. Just like InfiniBand has trickled down into the enterprise, storage array, and cloud markets from the HPC world, we think the new 25G Ethernet products aimed at clouds and hyperscalers will eventually see adoption in the enterprise and HPC space.
“The primary driver for 25G is the large cloud customers,” says Hull, adding that it “absolutely has applicability to other workloads and that there are benefits that other people will want to take advantage of. But the cloud is driving this transition and other people will eventually feel the benefits.” These include the oil and gas industry, often an early adopter, and other niche portions of the HPC industry that are sensitive to power consumption, bandwidth, and cost. Companies running grids for software development and chip development with the same constraints could find the 25G products appealing, too, says Hull, and in financial services, those doing Monte Carlo simulations, risk analysis on clusters could also be attracted to the 25G products.
Still, there are workloads where the move from Trident-II or Trident-II+ won’t make a lot of sense. While the Tomahawk chips have lower latency – around 450 nanoseconds for a port-to-port hop compared to around 550 nanoseconds compared to the Trident-IIs – 100 nanoseconds is not a lot when there is a lot more latency in the I/O, storage subsystems, and application stacks in clusters. People will take it, but it won’t drive the sale. Those who are latency sensitive will go for InfiniBand or Omni-Path.
Further to Hull’s point, plenty of enterprise customers are still using Gigabit Ethernet and have only begun their transition to 10 Gb/sec Ethernet, which continues to fall in price. This is why Broadcom launched a gussied up, VXLAN-friendly Trident-II+ ASIC back in April, to move that transition along. So if 25G is the new 10G, and 50G is the new 40G, then 10G is the new 1G, it looks like. (Pity that 200G is not the new 100G, but let’s not get ahead of ourselves.)
The Feeds And Speeds
Having laid out the market need and the forces compelling the 25G movement, let’s get into the new iron. There are three fixed port switches and a set of new modular line cards based on the Tomahawks and some updates to existing switches ones based on Trident-IIs.
The 7060CX-32 is aimed at top-of-rack use cases and has 32 ports running at 100 Gb/sec using QSFP100 ports and two SFP+ ports for management. With cable splitters, each of the 100 Gb/sec ports can be broken down into two 50 Gb/sec ports or four 25 Gb/sec ports. This switch has one Tomahawk ASIC to deliver its 6.4 Tb/sec of aggregate switching bandwidth (that’s the bi-directional bandwidth); its predecessor, the 7050X, had one Trident-II ASIC to deliver its 2.56 Tb/sec of switching bandwidth across its 32 ports running at 40 Gb/sec (again, bi-directional). The ports in the 7060CX-32 can automatically downshift to 40 Gb/sec speeds, and with splitters support 10 Gb/sec ports on servers or other switches. This is the new Arista switch comes in a 1U form factor and provides with the 450 nanosecond port hop latency mentioned above; it has a 16 MB buffer. This is the one that will sell for under $1,000 per port, according to Hull. This switch is shipping now.
The 7260CX-64 is a double-decker version of a Tomahawk switch that has four ASICs in a 2U chassis, providing 64 ports running at 100 Gb/sec speeds, with backstepping to 40 Gb/sec if needed. This switch has 12.8 Tb/sec of aggregate switching bandwidth, can chew through 9.5 billion packets per second, has a 550 nanosecond port hop latency, and 64 MB of buffer capacity. Expect this one to cost around $2,000 per port at list price. One of the primary use cases for this configuration is for linking nodes in clustered storage together because of the high rate of switching and the large number of 50 Gb/sec (128 ports) and 25 Gb/sec (256 ports) it will support in its chassis all under the control of a single instance of Arista’s EOS network operating system. The 7260CX-64 will be available in the fourth quarter.
There is a variant of this box called the 7260QX-64 that uses the Tomahawk ASIC but only supports 40 Gb/sec speeds (using QSFP+ ports), only 5.12 Gb/sec of switching bandwidth, and only 16 GB of buffer capacity. The aim of this machine is to be a lower cost spine switch for 40 Gb/sec networks that also burns less power than the older 40 Gb/sec switches based on 10 Gb/sec lanes and the Trident-II chips. This switch will cost around $1,000 per 40 Gb/sec port, says Hull, at list price. Port-to-port latency is at 550 nanoseconds. The 7260QX-64 will be available in the fourth quarter.
All of the switches above have an unspecified dual-core X86 processor plus 4 GB of main memory to run applications alongside the EOS network operating system.
One Giant Wonking Switch
Also coming in the final quarter of the year is the 7320CX-32 line card for the 7320X modular switches. The 7320X module switches come in two flavors, like their predecessors, with the 7324X chassis fitting in a 8U space in the rack and housing four line cards and the 7328X chassis taking up 13U of space and holding eight line cards. The top-end model can process 38 billion packets per second across its Tomahawk ASICs and delivers over 51.2 Tb/sec of aggregate switching bandwidth across its line cards. (Each line card is functionally equivalent to a 32-port 100 Gb/sec switch.) The ports run at 100 Gb/sec, but can be broken down to 50 Gb/sec and 25 Gb/sec speeds with splitter cables; a 100 Gb/sec port will burn about 17 watts, which is not great but not bad for a modular switch. Each supervisor module in the chassis has a four-core X86 processor (make and model unknown) on it with 16 MB of memory and 4 GB of flash, which is used to run applications atop EOS inside of virtual machines.
Arista has a larger modular box that can support up to 512 ports using the Trident-IIs, but it did not roll out such a beast using the Tomahawks. (It may never do so, depending on the needs of clouds and hyperscalers.)
As for pricing, expect the modular switches to cost a little more than fixed port switches for every 100 Gb/sec raw port because of the inherent expansion in the chassis. (This is analogous to the price difference between a big NUMA server and a workhorse single-socket or two-socket server. That expansion is not free.) The 7320X line card will start shipping in the fourth quarter.
In two-tier networks mixing fixed and modular switches, the new 25G switches from Arista can scale to over 27,000 10 Gb/sec or 25 Gb/sec ports in a single network; boosting the server port bandwidth to 40 Gb/sec or 100 Gb/sec means cutting the port count down to 9,216 ports. In simple and more common leaf/spine setups, the fixed port 25G boxes can be used to scale to 6,144 ports running at 10 Gb/sec or 25 Gb/sec speeds or 1,536 ports running at 40 Gb/sec or 100 Gb/sec speeds. All of these comparisons above assume a 3:1 oversubscription on the network.