The exascale effort in the U.S. got a fresh injection with R&D funding set to course through six HPC vendors to develop scalable, reliable, and efficient architectures and components for new systems in the post-2020 timeframe.
However, this investment, coming rather late in the game for machines that need hit sustained exaflop performance in a 20-30 megawatt envelope in less than five years, raises a few questions about potential shifts in what the Department of Energy (DoE) is looking for in next-generation architectures. From changes in the exascale timeline and new focal points on “novel architectures” to solve exascale challenges, and of course, to questions about planned pre-exascale machines like Aurora, it is clear there is a shakeup. As we noted in the PathForward funding announcement today, this represents a recognition that architectures and applications are changing quickly and the DoE wants to invest in systems that will be viable for the long-haul, but it also causes us to circle back to the idea of how important a novel approach to extreme scale computing fits into the bigger 2021 picture.
Among the six vendors selected for the PathForward funding, three are full systems companies—Cray, IBM, and HPE. For these, the emphasis will be building a hardware and software stack with all components that can scale reliably and efficiently and provide a programmable springboard for exascale application developers.
Unlike Cray and IBM, which have been building custom-engineered HPC systems for many years, HPE is unique on this list in that The Machine architecture (which we have detailed extensively architecturally) is not rooted in HPC (rather it was designed for pooled memory across large datasets with random access patterns in mind). To be fair, HPE is the top systems supplier in HPC, at least according to the Top 500 rankings, and the SGI acquisition will add more to their overall reach into HPC. If, as it appears, The Machine is adding to their overall share in the years ahead, HPE has a very strong game to watch play out in systems at the highest end.
As for The Machine, although there are promising elements of the architecture in terms of its interconnect and approach to memory, it was only just demonstrated at forty nodes. This is not to say it cannot scale past this, according to Paolo Faraboschi, the technical lead for the exascale push for The Machine at HPE, but the work to move from forty to four hundred thousand represents a major engineering challenge.
“Scaling a new interconnect to an exascale capable machine is a challenge but we do think we are up to it over the next three years of this PathForward project funding. We will certainly encounter problems along the way,” he admits, but HPE believes The Machine can hit the scalability, power consumption, reliability, and programming targets put forth by the DoE exascale goals.
“What the DoE saw in our proposal is that The Machine brings together a foundation in our memory fabric and interconnect and they like that we are agnostic to the node itself. Between GPUs, manycore, wide vectors, and co-package memory, things will evolve in a way that may not be easy to predict, even in the next few years,” Faraborschi explains.
By the time exascale systems are requisitioned, there will be several architectures to choose from. So the question is, other than fitting the demand for a “novel” functional exascale architecture, how does The Machine fit the bill better than the others, especially when the scalability of The Machine is currently limited. “There are two components from a memory fabric standpoint” that make it a good fit for exascale, Faraboschi explains. “First is the pooling of memory, but also the memory semantics protocol that handles communication at the speed of memory. What we are accelerating with the PathForward funding are applications of these technologies we developed in The Machine early on to far larger environments.” That forty-node demonstration has a long way to go since an exascale machine will be on the order of 100,000 sockets. To move a communications backbone from 40 nodes to 40,000 means a real need to think about the problems of adaptive routing, congestion control, and other network problems at a scale initial design teams for The Machine never thought of initially.
“The Machine was born to target the big and fast data space; large, fast-moving graphs with lots of data that needs to be centralized because of fairly random accesses. But scientific computing workloads are fundamentally different from that. They are really based on a pattern that is fairly well-partitionable, they require a massive amount of local bandwidth to memory…so HPC doesn’t need a massive memory pool but rather the memory semantics communication to be put together to implement efficient messaging and access to persistent data stories.”
Scaling the interconnect is one area were the PathForward funding will go for HPE. The second obvious place if The Machine is going to be an exascale architecture is on the silicon photonics front. The 40-node machine HPE demoed uses more coarse-grained multiplexing approach but for future exascale systems, getting the performance per watt right is critical. On an energy per bit basis, teams need to push into silicon microranks territory—something that has long been on the HPE roadmap but needs to be accelerated to fit into the exascale timeline. This begs the question, as we asked before when the PathForward R&D funding was announced, how feasible timelines are, especially if a novel architecture is still a quasi-requirement for at least one of the first exascale architectures.
Faraboschi points to the fact that exascale delivery and production dates are going to fluctuate (as they already have in the last couple of years). “for a 2020-2021 machine, we’d be looking at a very strange peak exascale machine—something that might reach peak exaflop but might not be as generally programmable as scientists want. PathForward is aiming for productive exascale—it is possible some of the partners in this project will push for a very early exascale machine but for true productive exascale, it is more like 2022. During that time too there will be a [silicon] process transition that will occur into the sub-7nm and with that mandatory call for energy efficiency levels, many things need to come together is what is really a short period of time.”
Of course, all the great ideas tested at scale are for nothing if a system is not broadly manufacturable. “Our position is that we want to ensure the tech we are delivering for exascale is not a one-off architecture but is conceived in the context of a commercial offering soon after. We are building The Machine with an awareness of the supply chain role in every part. We want to make sure this has commercial viability.”
Even though the push for a novel architecture as one of the exascale architectures might very well have helped HPE secure exascale R&D funding, ultimately, that novel emphasis is more to keep architecture forward-thinking going versus serve end users.
“Novelty is a double-edged sword in many cases for scientific computing,” Faraboschi tells The Next Platform. “When you work with scientists, there is a different mindset because scientists don’t usually care about something being novel. They want something productive, something they can use. They have job to do. And sometimes in the past excessive novelty got in the way of them getting those jobs done.”
Of the novel architectures for exascale push, Faraboschi contends that “right now, the feeling is that rather than novelty per se, what the DoE wants is an architecture and system that has the right balance and the right combination of open technologies that will let them move and scale as they want. They are not going after novelty for its own sake; they think some ways we are building systems these days has gone in the wrong direction, degrading metrics and the balance of the metrics they care about.”
“What we bring here is an open interconnect that allows those centers to take the best of those computing elements as they come. We have a unique perspective in that view compared to other vendors that are focused on the components side. To build an exascale system, you have to be a systems company.”
Sign up to our Newsletter
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
There still seems to be a lot of interest in the architecture of “The Machine”, but I thought it had been effectively cancelled last year. See:
Was news of its demise premature?
Yes. And based on a single slide.
There’s something I’m not quite clear on about The Machine. The ARM CPUs they’re using have 48bit memory addressing, giving them 256TB of addressability. How do they get to 1.25PB scalability with only 48 bits?
I know they’re using ARM specifically because Intel only has 46bit addressing, which gives 64TB, and apparently Skylake Xeons don’t change that.
How is the scalability from the physical addressing of the CPUs expanded to exabytes or larger? It would be great to get more technical details on the way this is being(and has been) implemented. It brings to mind the old Cray Urika which could address 512TB. Can’t finf the memory controller specs anywhere.