It’s Back To The Future For Omni-Path InfiniBand

People in the modern era sometimes forget that networking predates the rise of Cisco Systems and the commercialization of the Internet. Long before there even were routers and switches as we know them, there were intelligent subsystems and controllers that linked computers to each other, to end users, and to peripherals, and a lot of the architectures in modern networking are reflections of these approaches, but with modern twists.

As such, Phil Murphy, co-founder and chief executive officer of Cornelis Networks, brings a very long history in networking to bear on the company that he is launching today with Vladimir Tamarkin, vice president of platform engineering, and Gunnar Gunnarsson, vice president of solutions delivery and support. And the mission is to bring Omni-Path interconnect out of stasis and make it a viable alternative to Nvidia’s InfiniBand and Ethernet (formerly from Mellanox), Hewlett Packard Enterprise’s Slingshot (formerly from Cray), and Ethernet based on ASICs from Broadcom, Innovium, Intel, and others as they all relate to HPC and AI applications running on distributed systems.

If anybody has a chance of reviving the Omni-Path business and evolving it for the future, it is this team, which has 50 people in its first week of business – 40 of them coming from Intel’s Omni-Path team. That headcount includes the company’s three founders mentioned above, by the way, all of whom have strong ties to Pennsylvania, to the early InfiniBand market, and to Unisys, the mainframe and then X86 server vendor that was formed from the combination of the Sperry and Burroughs mainframe businesses way back in 1986.

This Phil Murphy is not the governor of New Jersey, but rather a kid who got a bachelor’s in mathematics from St Joseph’s University in Philadelphia and then got as master’s in computer and information science from the University of Pennsylvania. Out of school, Murphy landed a job at Burroughs, which is located in the Philly suburbs and which was founded in 1886 in St Louis to make various kinds of business equipment. (Burroughs is ancient, like International Business Machines, and walked down a similar path.) Murphy rose through the ranks at what became Unisys, and was the director of engineering at the company and was responsible for I/O subsystems for the Sperry and Burroughs lines. Unisys, like other system makers at the time, watched in the late 1990s as Intel put forward its Next Generation I/O update for the PCI-X peripheral bus, with support from Sun Microsystems, and as IBM, Compaq, and Hewlett Packard backed an alternative serial bus called Future I/O. In August 1999, IBM and Intel buried the hatchet and created the InfiniBand fabric specification, which would have brought a high-speed serial interface to clients and servers and storage alike if it had panned out as planned.

Seeing this opportunity, Murphy left Unisys in 1999 and the next year, at the height of the dot-com boom when money was flowing easily, started SilverStorm Technologies to provide InfiniBand hardware and software aimed specifically at the HPC space. Murphy was vice president of engineering at SilverStorm, and Tamarkin was director of engineering, and the company raised $40.2 million in two rounds of funding to develop and market its InfiniBand products. InfiniBand sure looked like the future, and then the dot-com bust came and then 9/11 came and the money dried up. IBM Microelectronics ditched its InfiniBand ASIC efforts, and then so did Intel, and then Microsoft backed off on InfiniBand for Windows, and then PCI-Express became the I/O standard for servers, and that was that. To its credit, Mellanox (and its partner then acquisition Voltaire) kept the InfiniBand effort alive and pivoted it into an HPC cluster interconnect.

SilverStorm hung in there against Mellanox, and was snapped up by QLogic, a maker of Fibre Channel SAN switches that wanted to protect its flank, in 2006 for $60 million, the same year that QLogic paid $109 million to get HPC compiler maker PathScale. The two were brought together to create QLogic’s TrueScale InfiniBand variant, which gave Mellanox a run for the HPC money in the 2000s. In January 2012, Intel, with big aspirations in HPC, snapped up the QLogic InfiniBand business for $125 million, which is how Murphy and Tamarkin got to Intel. Ditto for Gunnarsson, who worked with OEMs at SilverStorm, QLogic, and Intel with the various implementations of InfiniBand that these companies put into the field.

Three months later, Intel bought the “Gemini” XT and “Aries” XC interconnect businesses from Cray for $140 million, and as we have pointed out before, the Intel plan was always to take aspects of the QLogic InfiniBand and Aries interconnects and merge them together to create something that could scale to exascale systems. And importantly, what became known as Omni-Path was always intended to be tuned specifically for and tightly coupled with many-core parallel compute engines like the “Knights” Xeon Phi family of HPC and then AI engines. The “Knights Hill” Xeon Phis that were to be part of the original “Aurora” supercomputer at Argonne National Laboratory were supposed to be paired with 100 Gb/sec Omni-Path, both etched on 10 nanometer processes and giving Intel a substantial lead. As we all know, Intel killed the “Knights” Xeon Phi line in July 2018 and shifted to a more standard CPU-GPU hybrid strategy for exascale systems. Intel’s acquisition of Ethernet ASIC maker Barefoot Networks in June 2019 for something north of its $380 million valuation at the time called into question the future of Omni-Path.

It was no surprise at all in July 2019 when Intel said it was not going to be commercializing the second generation, 200 Gb/sec Omni-Path 200 series of switches and adapters. That left some 500 HPC centers and enterprises that had adopted Omni-Path in a kind of stasis – many of them none too pleased and quite a few who were already miffed that Intel was originally planning to make Omni-Path 200 not precisely compatible with Omni-Path 100 (through the addition of more Aries technology, among other things). Intel backed off on the compatibility issue, and then just spiked the whole thing.

Murphy, Tamarkin, and Gunnarsson are pulling that spike out of the ground and are not only creating Cornelis Networks to provide support for existing Omni-Path 100 customers, but are going to be creating a new variant of InfiniBand going forward and a roadmap to keep it on a regular cadence of upgrades.

To get Cornelis Networks off on a solid financial footing, Downing Ventures, a venture capital firm based in London, led a $20 million Series A funding round that also included Chestnut Street Ventures, an investment fund created by a bunch of UPenn graduates, and of course Intel Capital. Under the deal with Intel, Cornelis Networks has a five-year contract to provide support for those more than 500 customers worldwide. And Cornelis also gets all of the Omni-Path intellectual property (including rights to the QLogic and Cray technology), the inventory of ASICs, switches, and adapters, and hundreds of servers that are used for design and test.

While the Cornelis Networks launch release says that it is supporting 200 Gb/sec Omni-Path, Murphy says that it is not focusing solely on this second generation Omni-Path as it contemplates its product roadmap.

“The Omni-Path technology was really a marriage of Cray Aries at the lowest levels and QLogic InfiniBand at the highest level,” Murphy tells The Next Platform. “We are not going to continue with Intel’s 200 Gb/sec program for our next stop on the roadmap, which we will not call Omni-Path, but we are going to be able to use significant technology that developed in the Omni-Path 100 Gb/sec and 200 Gb/sec programs to get to a much better solution. We are going to be guarded for now about features and capabilities, but it will come out two to three years from now and it will be developed specifically for HPC and AI.”

That may seem like a long time, and we think there is a chance that Cornelis Networks is hedging on the timing and can do this quicker than we expect. We also expect for the battle between onload and offload to continue, and for Cornelis Networks to strike a better balance than QLogic and Intel did and that it might even incorporate some of the ideas of SmartNICs or data processing units (DPUs) into its future architecture, moving the networking off the CPUs and deeply into the DPUs to free up cycles on those CPUs but offloading to a DPU instead of a slightly brainy network interface. There is no rule that says a DPU has to be on the NIC, by the way. It can be its own thing, although being a bump in the wire is an advantage for network applications. In any event, Murphy is promising world-class SerDes and tuning for AI and HPC, and we will have to wait to see what it does in the 400 Gb/sec and 800 Gb/sec generations.

The big point that Murphy wanted to make is that the HPC and AI networking space is worth billions of dollars, and with HPE controlling Slingshot from Cray and Nvidia controlling Quantum InfiniBand and Spectrum Ethernet from Mellanox, the upper echelons of compute – particularly those that know and like Omni-Path – are going to want to have broader choices and intense competition. And they are going to want to not have it tied to any particular CPU or GPU architecture or any specific OEM, either.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

1 Comment

  1. One would assume that by learning from previous mistakes you can prevent future ones. $140M was a good learning experience.

    They should teach this case in VCs school – how to lose $20M in 5 minutes. Doing the same mistake again thinking the outcome will be different.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.