U.S. Bumps Exascale Timeline, Focuses on Novel Architectures for 2021

It has just been announced that there has been a shift in thinking among the exascale computing leads in the U.S. government underway—one that offers the potential of the United States installing an exascale capable machine in 2021, but of even more interest, a system based on a novel architecture.

As Paul Messina, Argonne National Lab Distinguished Fellow and head of the Exascale Computing Project (ECP) tells The Next Platform, the roadmap to an exascale capable machine (meaning one capable of 50X the current 20 petaflop capability machines on the Top 500 supercomputer list now) is on a seven-year, versus ten-year track, at least in terms of system delivery. The rest of the time will be spent on bolstering software efforts to make a broader array of applications functional on exascale systems—no matter what the underlying architecture.

Messina says that the ECP, which has participants from six of the major Department of Energy labs, is still focused on delivering two diverse systems, but the first is set for 2021 with the second in 2022. That first machine will feature the novel architecture—something that we had to ask for clarification about. After all, is anything but X86 novel—and are GPUs considered novel? Is Power on that list, and what about ARM since Japan is building its own potentially exascale-capable future supercomputer on that architecture well before 2021?

Further, by the time 2021 rolls around, we are going to see some familiar vendors doing some very different things—vendors that operate at a scale and stability level required for a big DoE investment, especially when such a large order is at stake. The best example is Intel, which by that time will have novel integrated FPGA, Xeon hybrids, and if the DoE sees enough interest in deep learning as a critical part of enough HPC workflows, there will also quite possibly be Intel, Nervana integrated chips for handling both training and inference simultaneously, along with a robust Xeon for crunching more traditional parts of a simulation workflow.

In reference to our arguments about what “novel” means in tradition-laden HPC (so because of the legacy application base that supports most scientific areas), Messina tells us that the ECP as body is “not interested in being prescriptive, but this does have to be a system that would be considered exascale by that time and it cannot be small because it has to suit mission needs. We are sending a clear signal that we are willing to entertain something that is not part of the systems at the high-end now; building blocks of systems that are not on the Top 500 systems now.” Messina stresses that what they are looking for is true capability—not just a theoretical peak Top 500 number. “It has to be significantly suitable for many applications, not just a few.”

It might seem that ECP has something in mind if they are confident enough to assign a year to such an undertaking. It would have been far easier to shoot for a novel architecture on the 2023 machine to give a new architecture plenty of production ramp and software support time. “The RFP, when you look at it closely, is not at all wired for any particular architecture. We think there are several possibilities. We have collectively talked to many vendors and manufacturers over the last few months and there are some things that appear feasible in that timeframe.”

There will be additional funding tied to this sped-up effort, Messina says, but of course, could not get into the numbers. From we were able to gather, it would not be a trivial difference because it requires fast ramp-up. Of the technologies on the table ready to roll in the next few years, other than integrated Xeon FPGA and Nervana chips, we have a few other options. Let’s blaze through those quickly for fun. First, there is something far out of range in terms the idea that this should fit as many traditional simulation-based workloads as possible—that is quantum computing. It is certainly novel, there is what appears to be a stable company behind the effort (D-Wave), but quantum systems aren’t applicable for massive physics-based simulations, at least at this point. The software development overhead itself seems to run to 2020 and such a system would need to be practical and viable for real applications before it hits the floor.

Another “very” novel architecture (this is the DoE here with one of its superstar supercomputers) is neuromorphic computing, something that has IBM and other investment behind it but lacks a mass production and programmability story. A lot could change on the programming side, especially if the national lab development teams put their collective minds to the challenge, but this seems fringe for such a large-scale, highly public effort, and since it is best on pattern matching-type problems, not a fit either for large-scale simulation.

So could this mean a custom ASIC for simulations? That would truly be novel, but to do this at scale and get a software ecosystem around it to boot? And who designs it? Who profits? Who gets that precious contract when a massive supercomputer—the first exascale supercomputer—has a strange vendor profile? It caused enough of a row when a chipmaker, in this case, Intel, won as primary contractor for the largest future supercomputer slated to date.

The most likely candidate at this point for a “safe” but novel architecture is something like HPE’s “The Machine” which we will describe more early next week in terms of its current status.

Maybe we’re overthinking the simulation angle. Perhaps the DoE and its labs are seeing such a deep fit for deep learning that they’re having a fundamental rethink of architectures entirely. After all, we have discussed already that the future Summit supercomputer was accidentally (in that it was designed before deep learning found its way into HPC workflows) built for this hybrid HPC/neural network operation style. Messina says deep learning is expected to be important but did not seem keen on pushing this as the reason for architectural choices—“it will be a part of things, possibly a very big part, but physics are physics and there are going to be certain aspects that do not need to change.”

The likely candidates were stated earlier, although it is possible that there is some other architectural secret waiting in the wings. For the DoE to take a risk on a new vendor, one that has not been tested at scale, with the one project that puts them to the maximum scalability test would be a surprise. The main takeaway is that the ECP wants to keep the vendor community on its toes by having as many options as possible. They are goosing the timeline to get people thinking creatively, thinking in a new way about some old problems. There is no guarantee that the machine will be based on something from left field, but ECP is opening the door. This bodes well for the little guy (although they keep getting swept into the big pools), but also gives existing chipmakers the needed impetus to think outside of their existing silicon boxes.

Here is the other major factor. Supercomputing is at a point where in order to move to exascale requires some fundamental rethinking of how software works at scale. How it takes advantage of all those hungry cores when it is limited by data movement and other bottlenecks. To move to exascale means retooling code significantly already—so why not take advantage of that disadvantage and see what exists that is entirely new? Even if it means rewriting all software from the ground up, for some application owners, this is what is ahead anyway, at least if they want to operate at the grandest of scales. In some ways, the time is right only now for novel architectures to have a valid play in the largest scale computing circles; something different is coming and the old ways of doing things don’t scale.

Messina says, “the timing from an applications standpoint is quite good. There are a bunch of new applications being developed and others being worked on with the hope of aiming at exascale systems. Even a natural progression of existing architectures will require substantial changes. The fact that we have in our plan a healthy amount of funding for a lot of applications teams to do the substantial work of tweaking, building libraries and compilers, and so forth shows we’re already expecting to support, and are supporting complexity.” In short, the work was going to be hard anyway. With the right architectural option, the resources for developers will still be needed, and perhaps for the cause of a more efficient system for future HPC applications.

As an important side note for the politically minded, the decision to explore novel architectures for the 2021 exascale deadline was made and agreed upon before the November elections. One could probably make the case for adding even more impetus to the exascale plea by shortening the timeline to be more competitive with other countries, including Japan and China, who are expected/rumored to have exascale capable systems ready before 2020, but Messina says this was a mission-focused decision to delivery better capability and efficiency to supercomputers.

Something for consideration – For the last few decades, compute models have been based on shared nothing cluster architectures which essentially means lots of small systems interconnected with network or in the case of HPC, using Infiniband or RoCEv2 type interconnects. This was all great when distributed computing was the accepted model.

If one considers the old adage of moving data closer to the compute engines, then I would suggest that the next generation compute models should look at shared disk cluster models that use very large core, TB scale non-volatile memory (3D XPoint?) systems with Infiniband/RoCEv2 / ??? interconnects such as those evolving from Mellanox and dedicated workload accelerators.

Think large clusters where each node is a 64+ core, 2TB+ local memory blade.

WP that compares shared nothing vs. shared disk cluster models:
http://bit.ly/2dScx9k
“”Comparing shared-nothing and shared-disk in benchmarks is analogous to comparing a dragster and a Porsche. The dragster, like the hand-tuned shared-nothing database, will beat the Porsche in a straight quarter mile race. However, the Porsche, like a shared-disk database, will easily beat the dragster on regular roads. If your selected benchmark is a quarter mile straightaway that tests all out speed, like Sysbench, a shared-nothing database will win. However, shared-disk will perform better in real world environments.”

OranjeeGeneral says:

December 9, 2016 at 5:25 am

I have the feeling this will be kicked further down the road as soon as 2018 comes along it will move to a 2023 target probably.

- Minnie says:
  
  December 10, 2016 at 2:26 am
  
  DoE’s plan since 2014 has been to deploy exascale by 2023, but it’s understandable that they would want to compete with China and Japan with earlier deadlines. However, I felt even the 2023 deadline was overambitious.
  
witeken says:

December 10, 2016 at 7:09 am

I think this article are too pessimistic about the possibility of a new architecture, although it of course depends on what is meant by those and I do understand that getting what will be a young technology in an exascale SC is quite ambitious.

But even 2 years ago we didn’t know about 3D XPoint, which is ramping next year (and sampling now). And 1 year ago we didn’t know about Nervana, which is also ramping next year and will be on Intel’s process tech by the next decade.

Given Intel’s ever increasing R&D (https://g.foolcdn.com/editorial/images/178614/intc-rnd_large.png), there are probably numerous technologies in the pipeline, and that’s just one (big) company. But I wouldn’t be surprised if by new architecture they mean something like silicon photonics and 3D XPoint, a bit like HPE’s The Machine.

Kerry Main says:

December 11, 2016 at 9:46 pm

Something for consideration – For the last few decades, compute models have been based on shared nothing cluster architectures which essentially means lots of small systems interconnected with network or in the case of HPC, using Infiniband or RoCEv2 type interconnects. This was all great when distributed computing was the accepted model.

If one considers the old adage of moving data closer to the compute engines, then I would suggest that the next generation compute models should look at shared disk cluster models that use very large core, TB scale non-volatile memory (3D XPoint?) systems with Infiniband/RoCEv2 / ??? interconnects such as those evolving from Mellanox and dedicated workload accelerators.

Think large clusters where each node is a 64+ core, 2TB+ local memory blade.

WP that compares shared nothing vs. shared disk cluster models:
http://bit.ly/2dScx9k
“”Comparing shared-nothing and shared-disk in benchmarks is analogous to comparing a dragster and a Porsche. The dragster, like the hand-tuned shared-nothing database, will beat the Porsche in a straight quarter mile race. However, the Porsche, like a shared-disk database, will easily beat the dragster on regular roads. If your selected benchmark is a quarter mile straightaway that tests all out speed, like Sysbench, a shared-nothing database will win. However, shared-disk will perform better in real world environments.”

U.S. Bumps Exascale Timeline, Focuses on Novel Architectures for 2021

Sign up to our Newsletter

4 Comments

Leave a Reply Cancel reply

Sign up to our Newsletter

Related Articles

So, You Think You Can Design A 20 Exaflops Supercomputer?

China’s Exascale Quantum Simulation Not All It Appears

The Memory Area Network At The Heart Of IBM’s Power10

4 Comments

Leave a Reply Cancel reply