AMD Hints At Future HPC Push
March 10, 2015 Nicole Hemsoth
When it comes to chip companies that have claimed a stake in the future of high performance computing, there used to be more competition. Over the last few years in particular, especially at traditional supercomputing sites that represent the highest end of HPC, Intel has been the overwhelming leader. This has left little room for what few upstarts might be viable candidates for HPC systems–and also left the old guard largely in the dust.
AMD used to have a stronger HPC presence, but the advantages that its Opteron processors had over their Intel Xeon rivals evaporated, And because of Opteron chip delays and an architectural shift for interconnects, Cray shifted from Opteron to Xeon processors with its latest XC30 systems.
AMD has had a few key HPC wins with earlier generations of Cray machines, such as at Oak Ridge National Lab where the GPU-boosted Titan supercomputer is outfitted with the “Interlagos” Opteron processors, giving it 27 petaflops peak and the number two spot on the Top 500 list of the fastest systems. It also had a top spot on the Green 500 list with a cluster configured with its FirePro GPU cards acting as coprocessors. There are still a number of machines on the Top 500 featuring older generation Opteron processors, these are most often found on Cray machines that are nearing the end of their lifespans, including the #40 Cielo machine and #44 system, Hopper, both of which were state of the art supercomputers at the time but which will be retired in the coming couple of years.
This is all ancient history on the timescales of the HPC market, where a machine is replaced in three to four years. The question now is: What does AMD plan to do to get back into the HPC arena again?
“We have only recently been able to make the investments required to get back into HPC in a serious way,” says Karl Freund, general manager for HPC at AMD. The bad news, he says, is that despite those investments on the software time, the cycles are long, even though the efforts are coming to fruition in terms of key elements required to move the HPC needle. Among the perceived shortcomings of AMD, at least from the Linux-dominated HPC camp, was the lack of a Linux driver model that was as solid as the one AMD had for Windows. It is this, Freund says, coupled with the HPC community’s desire for standards (OpenCL in this case) that are driving the market. “We’re actually in a good place now, and will be in an even better place in the near future.”
Freund has seen his share of trends in HPC following his various roles at Cray, IBM, and more recently, Calxeda, and says that part of the reason he took the job at the helm of AMD’s high performance computing efforts is because he sees what lies ahead—even if that road seems obscured to those of us outside of AMD. When pressed for details about where the R&D roadmaps for both HPC and enterprise workloads, Freund said that there will be important announcements over the course of this year, and added that AMD has in the past few years had an influx of research funds from the FastForward and DesignForward awards for node design and new interconnect technologies.
Back in November of 2013, the U.S. Department of Energy (DoE) in conjunction with the National Nuclear Security Administration (NNSA) announced a $25.4 million pot of R&D funds to split among five companies in HPC to develop technologies leading to exascale-class machines. Called DesignForward, funds were allocated to AMD, Cray, IBM, Nvidia, and Intel’s federal arm with the aim to create the interconnect fabric that would tie together exascale systems. While we’ve seen various products of this R&D money from Nvidia, Intel, and others, it is still tough to say what AMD’s direction is.
“You’ll see a lot of the fruits of this DesignForward work in upcoming large petascale and exascale systems, but we’re looking at a timeline that is in the 2020 to 2022 range,” said Freund. AMD was more recently awarded contracts (along with the same list of other vendors above) under FastFoward2, a follow-on to the original FastForward initative, which was a shared pool of $99.2 million. While Intel, Nvidia, and Cray will use these funds to focus on node research, IBM and AMD were awarded the funds to do research on memory. Freund says there is a node design component to this FastForward2 effort.
“Coupled with standards like HSA, the ability to have the interconnect as a full participant in a heterogeneous system architecture instead of as something just sitting at the end of a narrow straw (PCIe), that’s where the potential is.”
Freund told The Next Platform that node design concepts being developed with federal funds are showing up in AMD’s confidential roadmap for the exascale timeframe in particular, although he says that some key aspects of these developments will appear before 2020 and will be aimed at traditional HPC and other large scale-out applications. The FastForward and DesignForward funds are also supporting AMD’s research into next generation memory architectures. AMD and others expect that DDR memory will run out of gas well before the exascale at the turn of the next decade and AMD hopes to develop alternatives, which is an interesting area for the chip maker to explore.
AMD would also like to leverage its Heterogeneous System Architecture (HSA), which allows for the linking of CPUs and a various coprocessors into a common programming model, in conjunction with other open standards like OpenMP and OpenCL to get back into the HPC game. The company is one of several members of the HSA Foundation, which is geared toward allowing system designers to easily use an array of non-CPU elements, including FPGAs, GPUs, DSPs, and other devices in a way that minimizes the inefficiencies of sharing and managing data between them and the host processor. As the HSA Foundation notes, this design allows multiple hardware solutions to be exposed to software through a common standard low-level interface layer, called HSA Intermediate Language (HSAIL). HSAIL provides a single target for low-level software and tools, which frees programmers from the burden of tailoring a program to a specific hardware platform – the same code runs on target systems with different CPU/coprocessor configurations.
The members of HSA Foundation, which include ARM, Samsung, Texas Instruments, Qualcomm, and others, are toiling away at something that looks quite similar efforts along the lines of OpenPower—seamlessly integrating CPUs and GPUs and other compute elements across a common programming and data sharing framework. When asked how this HSA approach stacks up against OpenPower’s Coherent Accelerator Processor Interface (CAPI), Freund told us that it is the same concept at the high level, but beyond that, it’s only similar if one were to “completely rearchitect and revamp it to be industry standard-based. It’s cache-coherent access to user memory space from the interconnect. There is no open one yet, but the building blocks are there in the HSA specifications, which is why we think this will get broader adoption than a proprietary implementation,” Freund explained.
One could speculate about what work is being done to allow for swift, simple, and open integration between the company’s FirePro graphics cards and an Opteron or ARM host processor, And you might be jumping to the conclusion that AMD’s HPC team might draw from the “Freedom” fabric at the heart of its SeaMicro microserver line, which AMD acquired in 2012, but this seems unlikely even if AMD has learned some lessons about interconnects from SeaMicro.
Outside of the traditional areas in HPC, most notably in oil and gas, AMD is seeing opportunities for their Accelerated Processing Units, or APU, hybrid CPU-GPU chips, which are used in client devices but which are also available in server variants aimed at low-end server cards. AMD is focusing these APUs in many of the same places that Nvidia is pushing with its Tesla GPU coprocessors—in deep learning and neural networks. (Deep learning is the main theme of Nvidia’s annual GPU Technology Conference coming up next week.) Freund says AMD is already working closely with unnamed social networks that are testing its high-end GPUs for hosting deep neural network applications.
“We think that in a year or two from now that conversation will get even better, especially when the next generation of 14 nanometer GPUs comes out,” says Freund, adding that it is still early days for accelerated computing of any kind. NVLink is still some time off, IBM’s CAPI interface for Power8 chips is still in its early tentative stages, so there is a potential market here for AMD to tap into.
“If you just look at this as duking it out with Intel or Nvidia, it’s only a couple hundred million dollars,” says Freund. “What gets us going and investments going is how this can be applied to non-traditional GPU workloads. Large, scale-out CPU workloads that don’t parallelize easy.”