Intel Goes Barefoot As It Leaves The Omni-Path

The handwriting has been on the wall for some time now, but Intel has quietly dropped its 200 Gb/sec Omni-Path networking from its roadmaps and will be using other technology for interconnects going forward.

Intel confirmed the change with Omni-Path 200 Series switch chips, switches, and network interface cards to The Next Platform, but is not really saying much more about what its interconnect plans are at this time, particularly with its acquisition of programmable Ethernet switch chip maker Barefoot Networks still underway. But Intel did want to make it clear that it is still selling and fully supporting the current Omni-Path 100 Series devices, which as the name suggests have ports running at 100 Gb/sec and which have been deployed in a number of high end systems and supercomputers around the world.

Let’s peel this apart a bit and try to see where Intel might go from here.

Intel went on a buying spree for networking ASICs earlier this decade, starting with its acquisition of Ethernet switch ASIC maker Fulcrum Microsystems in July 2011, Intel never did much with the Fulcrum technology, which was a bit perplexing. To cover its HPC bases and to bolster its system approach out into multi-node clusters, Intel snapped up the TruScale InfiniBand adapter and switch business from QLogic for $125 million, and it followed that up quickly in April 2012 when it acquired the “Gemini” XT and “Aries” XC interconnects from supercomputer maker Cray for $140 million. Intel sold the TruScale products for a while as it started work on engineering their follow-ons, which became Omni-Path 100.

The idea with Omni-Path was to take the underlying InfiniBand technology, including its onload model of network processing, and tweak it with the Aries technology from Cray to create a better fabric. Omni-Path 100 was to have a smidgen of Aries tech, but Omni-Path 200 was to have a lot more. At one point, there were rumors going around that Intel wanted to make radical changes to Omni-Path 200 that would potentially break compatibility with Omni-Path 100 and possibly not make use of the OpenFabrics Enterprise Distribution (OFED) drivers for InfiniBand. As it is, Intel says that Omni-Path is not InfiniBand, even if users employ InfiniBand drivers on servers to access Omni-Path switches. We were under the impression that there was substantial pushback on these changes with Omni-Path 200 and that plan was abandoned for a more incremental upgrade of the technology.

The “Prairie River” Omni-Path 100 switches and “Wolf River” Omni-Path 100 adapter made their debut in November 2015. The adapters came with one or two ports, and there was even an adapter added to the “Knights Landing” Xeon Phi compute complex. The switches came in two flavors: “Eldorado Forest” had 24 ports or 48 ports in an edge switch and “Sawtooth Forest” had 192 ports or 768 ports in a director switch. Omni-Path 100 was designed to scale to 16,000 nodes, with the largest machines coming in at around half that. On the current Top500 rankings of supercomputers, 49 out of the 500 systems on the list employ Omni-Path, for about a 10 percent share, but if you look at true HPC systems on the list – not machines at hyperscalers, cloud builders, and telcos that happened to run Linpack tests for political reasons – then it is around a 20 percent share. InfiniBand from Mellanox Technologies (in the process of being acquired by Nvidia) has north of 50 percent share of true HPC systems. Omni-Path was a contender, but not the dominant HPC interconnect and certainly not preferred for AI training workloads as is the case with InfiniBand.

That was supposed to change with Omni-Path 200. Omni-Path 200 not only was expected to double up the bandwidth to 200 Gb/sec per port but also scale to many tens of thousands of nodes. The original pre-exascale “Aurora” supercomputer at Argonne National Laboratories (due in 2018) was going to have over 50,000 nodes using the now-defunct “Knights Hill” Xeon Phi processor lashed together with a single Omni-Path 200 fabric, as a case in point.

When the updated “Aurora A21” system at Argonne pushed out to 2021, not only did Intel shift the compute from Knights Hill manycore processors to a mix of Xeon processors and Xe graphics coprocessors, but the A21 system also did not employ Omni-Path 200 fabrics, but rather the “Slingshot” interconnect, a superset of Ethernet created by Cray for HPC and AI workloads that adds in some of the dynamic routing, congestion control, and other features of the Aries interconnect that Cray created so many years ago.

The writing was also on the wall for Omni-Path when Intel was rumored to be trying to acquire Mellanox earlier this year. The word on the street was that Intel offered $5.5 billion to $6 billion for Mellanox, which would have given it the Quantum InfiniBand line as well as the Spectrum Ethernet line, with broad access to HPC centers and hyperscaler and cloud builder facilities, respectively. In March, Nvidia announced its deal to acquire Mellanox for $6.9 billion, and Intel decided to take another tack and has opted to buy Barefoot Networks for an undisclosed sum, giving it control of the “Tofino” chip family and the P4 programming language for network devices that is catching on.

Barefoot Networks has been an increasingly strong proponent of putting more smarts into switches and offloading routines that might otherwise be run on a cluster of servers on the switch – much as Mellanox has been doing with its InfiniBand and Ethernet adapters and switches for years with its so-called offload model. The idea is that compute cores are expensive but network compute is cheaper and therefore scatter, gather, and other collective operations that need to be done when running HPC and AI workloads should naturally be done on the switches, which are hooked to systems, rather than on the systems themselves. Intel, as a maker of processors, was naturally drawn to the converse onload model, where network functions were offloaded to systems on the cluster to free up resources on the switches and adapters to help the network scale further and perform well. (We have never gotten a satisfactory set of data that allows us to ascertain which approach is “right,” whatever that might mean, on clustered systems.)

It would be easy to assume that Intel will be switching to more of an offload model going forward now that it is buying Barefoot Networks. But it is just as likely that Intel will let companies decide, using P4 as a means of running the same algorithms either scattered across CPUs in the cluster or centralized on the switches. Or maybe a little of both where it is warranted.

We will be discussing these topics with Intel, which is keynoting the networking section of The Next IO Platform event, which we will be hosting in San Jose on September 24. You can sign up to attend here.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.

Subscribe now

5 Comments

  1. I was excited about Omnipath and was looking to standardize on Omnipath for current generation (Eypc and Scalable Xeon) HPC nodes. Then I learned that not only would Intel not support AMD chips, but they wouldn’t even support previous generation Intel chips. Who wants an interconnected that’s tied to a particular vendors CPUs of a particular generation? I talked to a few other HPC sites and they had the same concerns.

    Intel doesn’t seem to realize that an interconnect is a long term investment that needs to work with all equipment you might buy over the next 5 years.

    • What would have been your reason for standardising on OmniPath? It’s just InfiniBand plus a few minor features but minus Hardware Offload. That would be okay for use cases like databases etc. where Ethernet is not good enough, but I would never go without Hardware Offload for HPC nodes. OmniPath is not even cheaper than InfiniBand.

  2. The onload/offload debate is almost as old as computing because it’s an economics and/or efficiency debate. If you care more about performance than price, then offloading work to hardware accelerators will probably be appealing. If you care more about upfront price than performance, then onloading will be appealing. Finally, the onload/offload tradeoff can change over time because not all hardware technologies are improving at the same rate.

  3. You can take a look on the economics. What is the $ portion of the network from the entire system? 10%? 15%? and what is the $ portion of the CPUs? 50%? if offloading gets you 20%, or 30% more CPU cycles, you should pay extra for offloading and it will be more economics for your datacenter. It is not the flops you buy, it is the flops you really get.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.