The “Great Lakes” supercomputer at the University of Michigan is the first cluster in the world to make use of 200 Gb/sec HDR InfiniBand switching from Mellanox Technology, which is sold under the Quantum brand.
Lawrence Livermore National Laboratory and the Texas Advanced Computing Center are also going to be early adopters of Quantum switches, and in 2019, there is no doubt that many new machines based on a variety of different processors and accelerators will deploy 200 Gb/sec InfiniBand as well because their designers want to double up the bandwidth in the interconnect.
That bandwidth can be used in a number of different ways, Gilad Shainer, vice president of marketing at Mellanox, talked about in an interview with The Next Platform at the SC18 supercomputing conference in Dallas in the following interview.
Companies can use Quantum switches to provide 40 pipes running at 200 Gb/sec into servers, or by using cable splitters they can make the Quantum switch have up to 80 ports, thereby allowing for large networks to be built with fewer switches – and therefore at lower cost per 100 Gb/sec port than is possible with EDR InfiniBand switches today. This is ideal for organizations that do not need the full 200 Gb/sec per port today, and when they do, they can just switch out the cables and double up on the 200 Gb/sec switching and not have to do anything else.
Mellanox has been working on HDR InfiniBand for years, and has been waiting for PCI-Express 4.0 to become more widely available in servers because you really need the bandwidth this peripheral bus level delivers to drive the faster 200 Gb/sec adapter cards in the server. It has taken a while, but IBM got PCI-Express 4.0 into its Power9-based servers late last year and ramped them up in volume across its scale-out Power Systems machines in early 2018. AMD is expected to support PCI-Express 4.0 in the forthcoming “Rome” Epyc processors and the Arm server processors – the “Triton” ThunderX3 from Marvell (formerly Cavium) and the “Quicksilver” from Ampere (formerly Applied Micro) – will support PCI-Express 4.0 as well.
It is unlikely that Intel will support PCI-Express 4.0 in the “Cascade Lake” Xeons coming later this year if the rumor mill is right, but it is possible that Intel made a last minute change to the designs and can do it. The point is, HPC customers are looking at processors other than Xeons anyway, and if they happen to have faster I/O, then they will win a bunch of deals.
Perhaps more important than bandwidth, however, is the in-network computing capability that Mellanox has built into its server adapter cards and its switches for many years now. By offloading more and more of the network overhead to these devices, the cores on the CPUs that might otherwise run network functions are freed up to do actual work – and not just for traditional HPC workloads, but also for machine learning routines that are also heavy on the network.
This, explains Shainer, takes networking and changes it into a kind of distributed I/O processing unit, or IPU, a companion in processing with both the CPU hosts and the GPU accelerators. A kind of distributed asymmetric multiprocessing, as it were. And it gets even more interesting if BlueField multicore processors are added to the server adapter cards, which allows for all kinds of algorithms and functions (such as security) in the datapath between storage and compute, or between compute nodes themselves.
Mellanox has been on a two year or so cadence of doubling performance, and admittedly HDR InfiniBand took a little bit longer than expected (and PCI-Express 4.0 was not mainstream anyway). Mellanox is planning on an aggressive roadmap for InfiniBand, and has already been hard at work on 400 Gb/sec NDR InfiniBand in 2020 and has an eye out for 800 Gb/sec XDR InfiniBand beyond that in about 2022.