The Final Frontier: Talking Exascale With Oak Ridge’s Jeff Nichols

Just ahead of the revelations about the feeds and speeds of the “Frontier” supercomputer at Oak Ridge National Laboratory concurrent with the International Supercomputing conference in Hamburg, Germany and the concurrent publishing of the summer Top500 rankings of supercomputers, we had a chat with Jeff Nichols, who has steered the creation of successive generations of supercomputers at Oak Ridge.

Nichols is associate laboratory director for Computing and Computational Sciences at Oak Ridge and two decades ago was running the Computer Science and Mathematics Division for the lab, and before that did a stint as deputy director of the Environmental Molecular Sciences Laboratory at the Department of Energy’s Pacific Northwest National Laboratory, and is one of the key developers of the open source NWChem computational chemistry simulator that includes both quantum chemical and molecular dynamics scale interactions.

We spoke frankly with Nichols about the importance of exascale and the difficulty of always designing the future with uncertain technology roadmaps and still delivering 10X improvements in generation after generation of systems. And we talked about money and time because these are also ever-present factors for any system that shape and enable to salient characteristics of that system, as much as throughput in flops, bi-sectional and injection bandwidth, storage, energy consumption, and what have you.

One of the things we wanted to know is how the planning process for systems like the “Jaguar” and “Titan” and “Summit” and now “Frontier” supercomputers begins. Do Oak Ridge have a set monetary budget and then the system architects see what they can get? Do they start with the electricity and thermal budget first? Or do they have a performance goal and then just see where the money and electricity budgets will end up and then wince as they throw the budget over the wall to the US Congress?

“Our target was to deliver a double precision exaflops of compute capability for 20 megawatts of power, and Frontier is a peak is two exaflops and our target is 29 megawatts of power when it’s running at full power,” Nichols tells The Next Platform, and by our math, that is one exaflops peak in 14.5 megawatts. Nichols adds that the goal way back when, and mostly for the poetry of it, was 1018 flops in 2018. The reason that Frontier could exceed the power consumption goals is that it took four years longer to deliver than the original plan. “Frontier has met the challenge when you are talking about the boundary conditions that are out there. We hadn’t really talked about money because I think the thing is that we wanted that 10X performance. We wanted to be 2 exaflops and 20 megawatts. But we have to think about usability, we have to think about the fact that we can’t just be doing something stupid and build something that users can’t program. There are all of those kinds of boundary conditions as well. So I think we’ve done a good job of being a steward of the dollars that have had had to go into the purchase of this machine by delivering a machine within those boundary conditions and that’s going to be a very usable machine.”

When we asked Nichols about the hardest thing about designing for the future – and that is what supercomputing has always been about, which is to build a machine with technologies that are four or five years in the future so the machine’s simulations can better help you see further into the future across myriad domains – Nichols had a surprising answer. One that, quite frankly, surprised us. And to find out what he said, you are going to have to watch the interview.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

4 Comments

  1. Excellent interview, appreciate Jeff Nichols candidness. Fav quotes: “Computing is ubiquitous, you can’t take us out!” Headline notwithstanding, love the history of how GPU accelerators pushed HPC to the next level. Few actual insights into Frontier but interestingly a lot of Nvidia name dropping.

    • Well, when I did the interview, things were still under wraps so I wanted to talk about how tough this all is, and not just get into feeds and speeds, which we both knew would be covered elsewhere at The Next Platform.

    • And just honored Jeff did the interview and was perfectly candid, of course. Which is how I have always known him to be.

  2. Cool interview! I like that Jeff and you brought up the contrast between AI/ML (a bit like witchcraft with some successful potions currently, IMHO) and PDEs. The most worthwhile machine computations (I think) would be those that solve problems that are difficult for us as humans (eg. solving nonlinear systems of PDEs numerically or analytically), rather than easy (eg. hot-dog recognition, or face recognition …). Hopefully we get Frontier’s HPCG results soon.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.