Another Step Toward FPGAs in Supercomputing

Nicole Hemsoth Prickett

6 years ago

There has been plenty of talk about where FPGA acceleration might fit into high performance computing but there are only a few testbeds and purpose-built clusters pushing this vision forward for scientific applications.

While we do not necessarily expect supercomputing centers to turn their backs on GPUs as the accelerator of choice in favor of FPGAs anytime in the foreseeable future. there is some smart investment happening in Europe and to a lesser extent, in the U.S. that takes advantage of recent hardware additions and high-level tool development that put field programmable devices within closer reach–even for centers whose users want to focus on their science versus what the underlying hardware is based on.

The most obvious reason why progress has been stunted here is because most HPC applications require dense double-precision and FPGAs are not associated with this (even if it’s possible) and further, because scaling beyond a single node or FPGA is still not a mainstream capability outside of major hyperscale centers like Microsoft, for instance, with its massive FPGA deployment to support Bing and various AI focused initiatives.

The other elephant in the room when it comes to questioning where FPGAs are in supercomputing clusters boils down to programmability—something that obvious is connected to the other two areas (precision and multinode/multi-FPGA scalability).

All of these issues are set to be addressed at Germany’s Paderborn University where the first phase of the ten million Euro Noctua cluster project is underway. Coming online in 2018, this initial section will feature 32 Intel Stratix 10 FPGAs for early stage experiments porting, programming, scaling, and understanding how some traditional (and non-traditional including K means, image processing, machine learning) HPC applications respond to an FPGA boost. That may not sound like many FPGAs to cram onto a system but recall that most work that happens on FPGAs for these applications is limited to one device or node and rarely focuses on scalability.

More standard applications in HPC particularly in bioinformatics are set to be another target, according to Dr. Christian Plessl, who tells The Next Platform that more high-level tools to support FPGAs are paving the way for more potential than ever before, especially with an FPGA like the Stratix 10.Once completed, the system will mark what we expect will be the one of the largest FPGA cluster sfor serving scientific computing users, although there are several sites with FPGAs in various stages and sizes of deployments worldwide and of course the collaborative Catapult project that serves both Microsoft and research users at TACC with over 350 Stratix 8 FPGA equipped nodes.

“The selected FPGAs, with 5,760 variable-precision DSP blocks each, are well suited to floating-point heavy scientific computations. They reached general availability just in time to be installed in the Noctua cluster. A first set of applications that benefit from FPGA acceleration is currently ported and was reengineered in close cooperation with computational scientists. This infrastructure will be used to study the potential of FPGAs for energy-efficient scientific computing, allowing the center to maintain its leading role in establishing this technology in HPC.”

Aside from the FPGAs, the system, a Cray CS500 system will have 272 dual-socket compute nodes and 11,000 cores comprised of Skylake processors of the high-end 20-core variety. The machine will have a 100 Gbps OmniPath interconnect, something Plessl says will be interesting in the context of multi-FPGA communication as the team rolls out research and benchmarking results.

“The single precision floating point is very good and there are thousands of floating point units that can use the full bandwidth internally. We also have high capacity and more logic resources and DSP logic blocks than we had with the testbed FPGAs,” Plessl explains. The testbed was comprised of Arria 10 FPGAs but the added DSP blocks in the Stratix 10 were what shifted his center’s thinking toward Altera/Intel in this case.

As with the early days of GPUs in HPC centers, the question was always what percentage of domain expert HPC resource users would be willing to care enough about the underlying hardware to work hard to adapting their codes. Early on it was not many but as the ecosystem of support and tools grew, it became easier and richer in terms of libraries for different groups. The FPGA ecosystem is not rich in that way for HPC by any means, but Plessl says that there is interest from a subsection of their users with higher level tools that deliver high performance results.

“FPGAs are only a small part of the overall cluster and most users will use the CPUs but the group of those interested in exploring new architectures is growing and includes internal users and and some industrial users that want to develop the relevant pieces starting at the kernel level and others are porting complete applications to get started,” Plessl notes.

On the block for FPGA acceleration are specific solvers for electromagnetics and nanostructure materials as well as computational chemistry for areas like electron structure analysis where doing on operations on large matrices is common and single precision is possible.

Plessl says he has seen the opportunities evolve for FPGAs over the years, stemming from his early experiences working with embedded reconfigurable devices a couple of decades ago. He says the time is finally right from a programmability and usability perspective finally and now with devices that have the hardware required to be competitive in an HPC environment, the important work of scaling and porting can pave the way for FPGAs to become a force in HPC–eventually, at least. This will be a center to watch over the next few years as performance, portability, programmatic and scalability lessons are learned and shared.