If Andy Bechtolsheim, the chief technology officer at datacenter switching upstart Arista Networks, wanted to design ASICs to try to take a bigger piece of the switch pie – or more precisely, thought that this was a good idea at all – rest assured, Arista would be spending money engineering its own chips and fighting for capacity at the four remaining foundries that have advanced processes.
But when Arista was founded back in 2008, one of its founding principles was to work on a virtualized and extensible network operating system based on Linux and let the merchant silicon providers – initially Fulcrum Microsystems (which was bought by Intel five years ago) and Broadcom, and more recently the XPliant division of Cavium (soon to be part of Marvell, we expect) – grind against each other, offering a mix of features and, as it turns out, different levels of switch programmability to suit the various needs of enterprise, hyperscale, cloud builder, and even a few HPC customers.
This is a strategy that has worked well, with Bechtolsheim – one of the co-founders of Sun Microsystems and also a co-founder of Granite Systems, a pioneer in the 1 Gb/sec Ethernet era in the mid-1990s that was sold to Cisco systems only a year after it was established, giving the dominant router supplier its entrance into the switching business it still dominates – wielding a kind of soft power, encouraging the industry to adopt the 25G Ethernet standard that the IEEE initially rejected (until Google and Microsoft leaned in) and that has transformed datacenter switching for the better (lower power and lower cost per bit) and also driving the architecture of future 400 Gb/sec Ethernet standards. Arista is also, of course, growing market share in datacenter switching and making a healthy amount of money, and the wonder is why someone has not snapped it up. The company’s whopping $20.2 billion market capitalization, which is nearly an order of magnitude larger than its annual revenue run rate, is definitely an inhibitor. With $54.4 billion in the bank, Cisco could afford it, but it already has its own pair of operating systems and uses merchant silicon for some of its switches as well as homegrown chips. Cisco doesn’t need Arista so much as it needs Arista to disappear.
No matter. We are just as happy to Arista to be expensive to acquire and therefore independent. Because of this, the company can do things such as embrace the “Tofino” programmable switch chips from upstart merchant silicon supplier Barefoot Networks, which the company has just done. One could say finally done, with a certain amount of exasperation, but switch buyers are very conservative and do a lot of heavy testing and therefore switch ASIC makers have to be patient as switch builders put the chips through the paces. This can take years. This conservatism means the Internet still works. And the competition between vendors means there are different network processing options to support the EOS substrate from Arista, much as there are different processor architectures for servers that support Linux and, in the past and now in the future, Windows Server.
“We don’t attempt to pick winners and losers in the chip industry, we take the best of breed from multiple silicon suppliers,” explains Martin Hull, vice president of cloud and platform product management at Arista. “We do believe that each of these chips from Broadcom, XPliant, and now Barefoot have different characteristics, although there is a certain amount of overlap. They all have high performance packet processors, and when customers put them side-by-side and consider their particular use cases, they tend to find one becomes more relevant than the other. Even if you could use any one of them for any particular customer, some are better.”
Over the past year and a half, customers have been asking for more programmability, and the switch ASIC makers have been rolling it out. The XPliant XP80 that goes into the Arista 7160 switch was the first programmable processor that the company used. This chip enabled Arista to add a feature called AlgoMatch, which replaces very expensive TCAM memory that has been typically used in switches for access control lists. This approach is somewhere around 50 percent more power efficient and delivers twice as many rules for IPv4 networks and four times the capacity for IPv6 networks. The XP80 also allowed for network admins to dynamically adjust table sizes for important resources such as MAC address tables, host tables, routing tables, and so forth, and to add new protocols and encapsulations as they come into the market. In the past, this would have required a rev on the chip, which can take 18 months to 24 months.
The company more recently added the “Trident-3” ASICs from Broadcom. The Trident line has had an increasing level of programmability, and it has been added to Arista’s 7050X line of modular switches, the very first products that the company sold a decade ago that have been updated several times with different ASICs. Hull says that the Broadcom chips have roughly the same level of programmability as the XPliant chips, although the manner in which it is implemented in a combination of microcode and firmware is different.
The Tofino chips from Barefoot take programmability up yet another notch, thanks in large part to the P4 programming language that is being championed by many, notably Google.
“The Tofino chip is unique in that it has a defined programming language that can be used to develop a policy or a profile that gets pushed into the chip to define a pipeline,” Hull explains. “What is also fundamentally different with Tofino is that it does not have a defined set of operations that will happen day one. It has a series of stages, lookups, and processing engines, and each one is not only programmable but has to be programmed in order for it to perform an operation. You can choose what operations happen, and the order in which they happen, and the outcome of a hit or a miss on a particular lookup action. It is a lot more flexible, and a company like Arista that has a large engineering organization can take advantage of Tofino to leverage it as a high performance packet processor but also overlay on top of it other functionality that is not necessarily on other packet processors.”
In this regard, the Tofino chips will give the switches that use them a wider possible range of use cases and a longer life in the field, it is reasonably to speculate. They can change their personalities many times, and find different uses in different parts of the network over time. But that is not the real benefit. Getting new functionality faster is the real goal of these programmable chips, breaking the ASIC development cycle and doing something in two months that, before programmability, would take two years.
You might be thinking that this programmability carries a high premium to end users, but it doesn’t. Hull estimates that there is something on the order of a 10 percent to 20 percent premium for programmable switches, with the premium being highest on the Tofino chips. In some cases, Arista will provide pre-programmed profiles for the switches based on the Tofino chips, and charge a recurring subscription for them, and in other cases such as with the hyperscalers and financial services organizations that have sophisticated network teams and programmers, they will do it themselves and keep the intellectual property under wraps.
By the way, Hull doesn’t expect that the market will suddenly and abruptly adopt the Tofino-based products, but rather a small number of large customers with intense programmability needs will buy a relatively large number of these switches; enterprises and other service providers will take more time to see the benefits and learn how to deploy them.
There are four SKUs in the Tofino lineup, and here is how they can be chopped up to provide different port counts and speeds in a switch:
At the moment, Arista is offering two different 7170 series switches based on the Tofino chips.
The 7170-32C comes in a 1U chassis has a single 3.3 Tb/sec ASIC that, as the name suggests, can drive 32 ports at 100 Gb/sec speeds and then a proportionately larger numbers of ports at lower speeds. This switch has 22 MB of system buffer memory, can forward packets at 2.5 billion packets per second, has a port-to-port hop latency of 800 nanoseconds (which is not all that great for Ethernet, with some chips down in the 450 nanosecond range), and a typical power consumption of 7 watts per port.
The 7170-64C has one 6.5 Tb/sec Tofino ASIC in a 2U box, and can drive 64 ports at 100 Gb/sec; the average port burns 5 watts because its power supply and cooling is more balanced, we presume. (The spec sheets quote the full duplex bandwidth of the device, which is 12.8 Tb/sec for all packets in and all packets out, taking out overhead from peak. That does not mean it has two Tofino chips in it.) The system buffer size and port hop latency are the same, but the bandwidth of this switch is 6.5 Tb/sec and the forwarding rate is 5.08 billion packets per second.
The larger 64 port switch based on the Tofino chip is shipping now, with a cloud profile shipping and a security and telemetry profile coming in August. The smaller switch will start shipping in the third quarter. The switches have a list price of $1,200 per port in the bigger box. This is significantly higher than the $516 per port charge that the industry averaged in the first quarter, according to data from IDC. But this is not precisely an apples-to-apples comparison, one being list price and the other being an average street price across a wide range of feature sets in the chips. Our point is that you can probably get these switches for anywhere from 20 percent to 40 percent off list, in the right volumes. Like hundreds to thousands of units.
Incidentally, we asked Hull why not put two chips in the box to increase the radix of the switch. Here is what he had to say: “If we were attempting to make a 7170 based system with more than one chip it would actually require six chips, in a Clos configuration, and the power would be substantially greater. Its not possible to connect two devices back to back with non-blocking bandwidth and achieve more capacity than the bandwidth of a single chip. This is because you need to dedicate 50 percent of the chip I/O just to connect to the other chip, so you are left with 50 percent x 2 = 100 percent, right back where you started.”