Welcome to the second part in our series of chats with J Metz, chair of the Ultra Ethernet Consortium.
Like many of you, we have watched the epic battle between Ethernet and InfiniBand in the past two and a half decades. We have also watched how technologies – particularly remote direct memory access across devices linked over the network – originally developed for InfiniBand have made the jump to Ethernet.
InfiniBand has proved very difficult to beat in HPC and, for the moment at least, is the preferred low latency, high bandwidth interconnect for back-end networking in AI training clusters that are a similar kind of supercomputer. But with the changes being implemented by members of the UEC, Ethernet is going to be able to scale much further than InfiniBand ever could and will have some properties that are still missing from InfiniBand.
What we want to know from Metz is how can the UEC members change Ethernet substantially and retain compatibility? Perhaps there is legacy cruft in the Ethernet stack that can be jettisoned for more modern workloads like HPC, AI, and data analytics? We have certainly seen this in hardware with a forking of Ethernet switch ASICs from Broadcom, with enterprise-grade “Trident” switch chips that support all current and legacy protocols and the newer, hyperscaler-inspired “Tomahawk” devices that are designed explicitly for scale, low power, and a minimalist protocol stack.
We also wanted to be explicit and ask if UEC is trying to kill InfiniBand – and if not, why not?
And finally, we wanted to comprehend the scale goals of UEC – several terabits per second ports and over 1 million endpoints in a single network. This sounds like science fiction, even with the burgeoning of cluster sizes driving by massive AI training supercomputers. How far does the scale of UEC networks have to go and how fast does it need to do it?
These and other questions are answered in this chat with Metz, and all you need to do is click on that video above and you can sit in and here what the UEC thinks about all of this.
If you want to see part one of this interview series, it is available here.
Quite exciting to hear that UEC is orchestrating this compartmentalized tractor pull of granular SR-71 Blackbirds in cross-pollination with Infiniband! All the better to get elephant flows cohabitating nicely with mice flows, and guys with a chicken on their heads! … Q1 should be awesome (in its own Schrödinger empty/full-box kinda way) … 8^b
Ah, typical American strategy. Don’t defeat the enemy, just send signals. Worry about what the dictator might feel rather than removing the dictator.
You know, Infiniband might get irritated and escalate or something.