Feeding The Insatiable Bandwidth Beast
April 30, 2018 Timothy Prickett Morgan
Breaking into any part of the IT stack against incumbents with vast installed bases is not easy task. Cutting edge technology is table stakes, and targeting precise customers with specific needs is the only way to get a toehold. It also takes money. Lots of money. Innovium, the upstart Ethernet switch chip maker, has all three and is set to make some inroads among the hyperscalers and cloud builders.
We told you all about Innovium back in March last year, when the company, founded by former networking executives and engineers from Intel and Broadcom, dropped out of stealth and started talking about its streamlined approach to high bandwidth, low latency Ethernet networking. A year ago, Innovium was the first of the switch ASIC makers to be talking about pushing a single ASIC up to 12.8 Tb/sec of aggregate switching bandwidth, which was capable of driving up port bandwidth as high as 400 Gb/sec using a minimalist approach to the Ethernet protocol, only supporting those features that datacenter customers need.
By streamlining the number of Ethernet features supported and by cranking the SerDes communications circuits on its Teralynx 7 chips up to 50 Gb/sec using PAM-4 signaling, Innovium was the first to blaze the trail to 200 Gb/sec and 400 Gb/sec Ethernet at a time when the market was, quite frankly, having a hard time ramping from 10 Gb/sec or 40 Gb/sec switching up to 100 Gb/sec. To be even more precise, we think switch ASIC vendors were quite happy to have the ramps to 100 Gb/sec and then 200 Gb/sec and on to 400 Gb/sec take a longer, measured approach, but the need for higher bandwidth pipes as well as higher radix switches by hyperscalers, cloud builders, and some service providers and telecommunications firms is forcing the networking business to pick up the pace of innovation.
As we have pointed out before, this is necessary because the rate of change in the aggregate throughput of compute and the increase in the amount of storage deployed in datacenters has grown considerably faster than the rate of change of network bandwidth. Things are out of whack and that has meant that the vast Clos networks that these companies deploy, often linking together 100,000 into a single fabric within a datacenter, take too many switches. By increasing bandwidth per switch ASIC, the port count per switch can be boosted – this is the high radix part – and it takes fewer switches to build out the Clos fabrics. Moreover, the same ASIC can be used to make very fat pipes, perhaps to link multiple datacenters together. As has been the case for many years now, the hyperscalers and cloud builders are driving technology, more than any other force in the IT sector. This stands to reason, since they have the most intensive demands on the planets – outside of HPC and machine learning, which have their own unique needs that also drive technology ahead.
Enterprises, of course, eventually benefit from all of this innovation. The largest enterprises can get on the front end and try to get competitive advantage over their rivals if they run with the upper echelon. Some do.
When we talked to Innovium last year, the company was the only one talking about pushing up to 400 Gb/sec. But in that time, Mellanox Technologies has unveiled its Spectrum-2 chips, which added PAM-4 modulation (which allows four levels of signaling to store two bits at the same time on the signal) to its existing 25 Gb/sec SerDes in the initial Spectrum ASICs, yielding Ethernet lanes that run at an effective 50 Gb/sec speed and with four of them top out at 200 Gb/sec; the company has a path add lanes in a subsequent Spectrum ASIC, which will allow it to reach 400 Gb/sec ports, and can add more layers of PAM modulation and boost the SerDes to a real 50 GHz or 100 GHz to push the bandwidth up even higher over time, perhaps as high as 3.2 Tb/sec per port at some time in the future. Networking industry luminary Andy Bechtolsheim, of switch maker Arista Networks, says that pushing SerDes speeds beyond 100 GHz will be very difficult, and this is something that is a rising consensus among the ASIC makers. (Arista is a consumer of, not maker of, switch ASICs.) Barefoot Networks has not talked about successors to its original “Tofino” ASICs, which range in speed from 1.8 Tb/sec to 6.4 Tb/sec and deliver ports that run at 10 Gb/sec to 100 Gb/sec, but it no doubt has a roadmap to push the chips up to 200 Gb/sec and 400 Gb/sec or hyperscalers and cloud builders would not consider its technology. Cisco Systems showed off its own 400 Gb/sec Ethernet chip used in routers last year at Hot Chips, more of a stunt than competition in this switching sector, and earlier this year, datacenter switching juggernaut Broadcom previewed its “Tomahawk-3” ASICs, which will also deliver 12.8 Tb/sec of aggregate bandwidth and sport PAM-4 modulation. With this, Broadcom is drawing level with Innovium, supplying up to 32 ports running at 400 Gb/sec, and started sampling the Tomahawk-3 chips in January.
Innovium started sampling all four variants of its Teralynx 7 ASICs back in March, Amit Sanyal, vice president of marketing at Innovium, tells The Next Platform. As we explained last spring, the Teralynx ASICs come in three bandwidth ranges – 3.2 Tb/sec, 6.4 Tb/sec, and 12.8 Tb/sec – and support both non-zero return (NRZ) modulation at 25 Gb/sec per lane and PAM-4 modulation at 50 Gb/sec (effective) per lane. The chips are etched using the 16 nanometer processes from Taiwan Semiconductor Manufacturing Corp, and the expectation is that Teralynx 7 will go into production towards the end of the year, says Sanyal.
While Innovium has three different speeds of Teralynx 7 chips, there are actually going to be four variants, as shown below:
There is a lot of excitement about 400 Gb/sec Ethernet, and importantly, the development of the related optics and cables to support the use of 400 Gb/sec ports on switches, but this is not the entire business that Innovium is shooting for. “In a way, 400 Gb/sec helps us, but we are not completely reliant on 400 Gb/sec optics because our switch chip can use 100 Gb/sec or 200 Gb/sec optics as well,” says Sanyal. “Certainly, some customers are looking for 400 Gb/sec optics, but other customers are going to be deploying the high radix 100 Gb/sec variant as well, such as in 128 ports of 100 Gb/sec. You can run these in various modes, using optics that are shipping today. You can run it in 64 ports of 200 Gb/sec, and some optics vendors already have this available and there are some customers that are interested in this. You can obviously do 32 ports of 400 Gb/sec, which is what a lot of the hyperscalers want to go. We have two different ASICs at 6.4 Tb/sec, which allow 256 SerDes using NRZ and 128 SerDes using PAM4, and this is important because we will be the only player that can support a top of rack switch with 50 Gb/sec server ports. This is important as 50 Gb/sec servers start being deployed in the coming quarters. Our 3.2 Tb/sec switch is like everyone else’s in that it supports 128 SerDes using NRZ. The key thing we are enabling is a lower switch count on Clos networks. Today, these hyperscalers use twelve 3.2 Tb/sec switch ASICs to build high radix switches with 128 ports at 100 Gb/sec, and moving to more modern 6.4 Tb/sec ASICs, they need to use six chips. With us, they will be able to do it with one chip. This means a lot less power, and a lot less cost.”
It also means a lot fewer hops across the fabric, which lowers latency across the network, too.
Here are the ways that the Teralynx 7 chips could be deployed in various switches:
Last month, Innovium demonstrated the top-end 12.8 Tb/sec part by hooking it up to a traffic generator and a gearbox system that fed data into a prototype switch configured with 128 ports running at 100 Gb/sec. The test showed the Teralynx ASIC processing 1 billion packets, with a mix of packets ranging in size from 64 bytes to 1,518 bytes, with the majority being 64 bytes, and the switch ran at line rate with no packet drops or errors. (You can take a look here at the report.)
Having the technology is just table stakes, of course. Technology vendors also have to amass big bags of money to peddle their wares against the competition, particularly if they are not one of the big incumbents, and to continue the research and development that proves to customers they are not just a one-trick pony.
To that end, Innovium has raised $77 million in its Series D funding round, with Greylock Partners, Walden Everbright, Walden Riverwood Ventures, Paxion Capital, Capricorn Investment Group, Redline Capital, S-Cubed Capital, and Qualcomm Ventures all kicking in dough this time around. That brings the total investment that Innovium has brought in since its founding back in 2015 to $165.3 million.
As to what Innovium has up its sleeve for future products, the company is not saying. “Our first goal is to get these chips to production,” says Sanyal. “Beyond that, we have a unique technology that allows us to pack more silicon in less die area, and this is where a lot of our 50 patents come from. We will be ahead, and it will be very hard to others to catch up. Our main competitor has had to scale down features to get to 12.8 Tb/sec. We are looking at the next process node, but we don’t want to talk too much about that. Our plan is certainly to build products for different performance and price points, with different scale up and scale down options. We have started work on that, and as process nodes get proven, we will move to adopt them.”
As with other vendors, Innovium will no doubt crank up the raw speed of the underlying SerDes, widen the lanes per port, and boost PAM modulation to chase ever-increasing bandwidth needs. At a certain point, the silicon and the optics are going to have to come together to get more speed, hints Sanyal, who says no more about it. As for latency, it looks like the 350 nanosecond port-to-port hop with the Teralynx 7 is good for now, and is as good or better than any of the other Ethernet players can deliver. Of course, Mellanox InfiniBand and Intel Omni-Path are down around 100 nanoseconds for those with acute low latency demands.