Innovium Hitches Its Switches To SONiC’s Boom

Breaking into the datacenter compute or networking ASIC business is about as easy as trying to start up a new global car company. It can be done, as Elon Musk so aptly demonstrates in practice with cars and as many CPU and switch/router ASIC suppliers are trying to prove. But either way, it takes a lot of effort and money to squeeze into the market next to established and fierce incumbents.

Luckily, there is no shortage of money flowing into the high tech arena, so a company such Innovium, which provides high-end switch ASICs aimed at hyperscalers and cloud builders, has been able to design several generations of ASICs and bring them to the field. Xsight Labs and Nephos are taking a stab at it, too, but are relatively secretive about what they are up to and they face the same enormous pressures that Innovium does in trying to get a toehold among the Super 8 and the next level down of clouds and service providers.

There are several different things going on at the same time that make it an interesting time for Innovium to try to go from being a startup with a few big (and mostly secret) design wins at the Super 8 to being a credible alternative supplier of switch ASICs alongside the likes of Broadcom, Cisco Systems, Nvidia, Intel, and Marvell. There is a bit of a chicken and egg problem here, but it is not too bad. To sell switch ASICs, you need to get someone – the hyperscalers, the cloud builders, the switch OEMs, or the open source or closed source network operating system suppliers – to port their software to your switch ASIC and software development kit. But to get any of these to do the port, you need to sell a lot of switch ASICs first.

It was easier for Broadcom with the Trident family of chips, which arguably started the merchant silicon game in earnest in switching. Pricing on switching gear from the OEM incumbents, particularly Cisco, was so bad in the mid-2000s when the hyperscalers and cloud builders started to rise that the appetite for a more disaggregated approach to datacenter switching was huge, even of it was for just a relative handful of customers. Now, a decade and a half later, there are a number of ASIC suppliers, and they are getting traction here and there among the largest switch buyers.

Mellanox, now part of Nvidia, as is Cumulus Networks, has installations at Microsoft (so we hear) and Innovium has installations at AWS (so we hear) and at LinkedIn (which Innovium confirms), which probably means Microsoft is also playing with switches based on Innovium silicon. There is no question that proofs of concept are always underway with each generation of device from all of the ASIC suppliers across a representative sample of the top several dozen companies that run datacenters at scale. Barefoot Networks, now part of Intel, was getting some traction with its Tofino chips, and both Innovium and Barefoot got the nod from Cisco when it ported its own NX-OS operating system to these two families of chips and rolled out Nexus switches based on them a few years back. (We suspect that going forward Cisco will use a mix of its own Silicon One switch and router ASICs as well as third party ASICs as customers and conditions dictate.)

Given all of this, the easiest way for Innovium to broaden its appeal to the top 200 companies who need 400 Gb/sec and 800 Gb/sec switching in the datacenter in the near and middle term, or who want to bust those down to 100 Gb/sec or 200 Gb/sec ports and have a very high radix switch indeed, is to support the open source SONiC network operating system. SONiC was created by Microsoft and then released into the wild way back in 2016. Supporting SONiC – literally providing tech support for it for switch customers — would also tend to get Innovium on Microsoft’s good side.

And so that is precisely what Innovium is now doing through a program called TeraCertified. In essence, Innovium is being pushed by the market into providing a kind of vertically integrated stack where it supplies its Teralynx switch ASICs as well as technical support services for the SONiC network operating system and its Switch Abstraction Interface (SAI) underpinnings, which was launched six years ago based on a Linux kernel by Microsoft and which as we talked about last May, was getting lots of traction. While Dell has a supported version of SONiC, oddly enough Microsoft does not. Innovium knows its own ASICs far better than any other company, so it is logical to offer such support and hardware validation services.

“If you look beyond the top clouds and hyperscalers, the next tier of customers would like to adopt this disaggregated model and get the benefits,” Amit Sanyal, vice president of product marketing and management at Innovium, tells The Next Platform. “But there are hurdles. These companies do not have the engineers to do hardware and software integration and validation. And the ODMs lack experience with the ASICs and the NOSes, so they are not much help. As a result, they are forced to go back to the switch OEMs and buy the proprietary NOS and get lock into that vendor. Some companies try to buy whiteboxes that have the hardware already integrated and validated, but they have to do a lot of work with a NOS to ensure its robustness. And so, as a result, 95 percent or more of the customers do not use network disaggregation, whether they are in the enterprise or among the big clouds and hyperscalers. Customers want one throat to choke, they want the appliance feel without the appliance pain.”

What we think they really want is choice. They want multiple ASICs that support multiple NOSes – and then they want to pick something that averages out to one and a half. They want to keep the heat on their hardware and software suppliers, but they don’t really want to have too many different kinds of switch ASICs or operating systems on their networks. It’s more like they want an option and the competitive pressure that threating to take it provides.

According to Sanyal, the NOSes under development by the hyperscalers, cloud builders, and switch OEMs have hundreds of software engineers creating, extending, and supporting that code. Arrcus, the company behind the closed source ArcOS operating system that does switching and routing, from edge to core, has a sizeable team and a growing base of customers, but does not yet support Innovium ASICs. (As a startup, Arrcus has to play it conservative and stick to the Broadcom families of switch and router ASICs, but we would not be surprised to see the company eventually support Cisco’s Silicon One and Innovium’s Teralynx families.) Of the 220 people that Innovium has on staff, 80 software engineers are working on various aspects of SONiC and related telemetry software, so this is a significant effort.

There is no reason, by the way, that Innovium could not work with Arrcus to create a similar certification program for ArcOS – and perhaps that is exactly what they should do. Innovium certainly has the cash after a $170 million Series F funding round last summer, giving it a unicorn valuation of $1.3 billion.

 

Innovium was founded in 2014 and dropped out of stealth mode in March 2017 with its Teralynx 7 ASIC, which was followed up by a lower-end Teralynx 5 chip in 2019. It’s Teralynx 8 chip, which launched last year and which reaches 25.6 Tb/sec of aggregate bandwidth, is sampling now and should appear in products before year end.

What Innovium is not doing, by the way, is getting directly into the switch business. The switches based on Teralynx chips are sold by ODMs or other channel partners and Innovium is working with them to get certification for cables and transceivers so a complete sale can be done, like Cisco or Nvidia or Intel would do.

Here is the other thing that probably is motivating Innovium to provide tight integration and support for SONiC: Amazon Web Services. If the rumors are right, then AWS is working on its own set of switch ASICs and it will not need Broadcom or Innovium or anyone else down the line. And if Microsoft is willing to make its own CPUs for servers, as is rumored, then it also might be willing to make its own switch and router ASICs, too. Ditto for Google, but the search engine and advertising behemoth seems less interested these days in getting into the silicon business – TPU and security chip excepted. All of these companies, as well as Facebook, are rich enough to do chip design on their own. They have their own NOSes and they want to create vertically integrated platforms. Why not?

What we want to know is how SONiC stands up in terms of performance compared to the proprietary operating systems controlled by the switch OEMs – IOS and NX-OS from Cisco, EOS from Arista Networks, and JunOS from Juniper Networks are probably the important ones – as well as ArcOS from Arrcus. While SONiC is open and has over 50 companies working on it with thousands of engineers, it’s performance could not be up to the same level as these other NOSes. Just as was the case when Linux first debuted and could barely run well on a single-socket server in the late 1990s. But, over time, Linux became a peer to the best Unix platforms on Earth as many tens of thousands of engineers worked on it – some of them the very same people who used to work on Unixes. And just like Windows Server has been able to keep technical pace with Linux on the server, we expect that even if SONiC can reach parity with those proprietary OSes, ArcOS will be in the market, keeping SONiC honest and the community working hard to expand features – perhaps even extending up into routing and down into the edge as ArcOS has done.

Innovium and its partners will have Teralynx switches with 100 Gb/sec, 200 Gb/sec, and 400 Gb/sec ports running SONiC/SAI by the middle of this year; 800 Gb/sec switches running SONiC are being tested now and will be available towards the end of 2021. Sanyal says that it is fighting a very aggressive price/performance campaign to get into more datacenters, and that right now an average 400 Gb/sec Ethernet switch costs between $1,800 and $2,000 per port; switches based on Teralynx, he says, will cut that cost in half, and then the operating costs will also be a third lower on top of that.

AWS
Vendor Voice - High Performance Computing on Amazon Web Services

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

2 Comments

  1. Timothy, you’ve got way too many typos and broken text here. Your tenth paragraph is cut off, with the last sentence starting “But Sonic” and nothing after that – there’s no telling what’s missing there.

    You have several other breakages and typos as well.

    • I think I caught most of them the second time through. It was a . . . challenging day . . . yesterday.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.