It’s easy when talking about the ongoing push toward exascale computing to focus on the hardware architecture that will form the foundation of the upcoming supercomputers. Big systems packed with the latest chips and server nodes and storage units still hold a lot of fascination, and the names of those vendors involved – like Intel, IBM and Nvidia – still resonate broadly across the population. And that interest will continue to hold as exascale systems move from being objects of discussion now to deployed machines over the next several years.
However, the development and planning of these systems is a delicate balancing act, one that has to take in not only the needs and capabilities of the hardware going into them but also the software stacks will be the engines that fuel them and the applications that run on top of them. There also is the pressure to create systems that will have commercial and academic value and that meet the needs of businesses, researchers and government institutions, rather than just be experimental projects that satisfy the desire for more powerful computers that can process huge applications but have no practical use.
These balances were among the key points made by Paul Messina, Argonne National Lab Distinguished Fellow and head of the Exascale Computing Project (ECP), during a talk earlier today at the Princeton Plasma Physics Lab. The work that the ECP is undertaking needs to not only to meet the challenge of reaching exascale computing, but also to ensure that it contributes to everything from national security and solving the biggest challenges in science and healthcare to training the next generation of computational scientists and engineers and partnering with vendors to create systems that can be used commercially.
“Companies in the United States and labs in the United States and other countries can buy these systems,” Messina told the audience, noting why the ECP was partnering with vendors. “If we did it ourselves, it would have been an interesting experience” but it wouldn’t have offered broad value to others.
As we at The Next Platform have talked about, exascale computing will drive the latest applications in a broad range of scientific and commercial fields, from meteorology and pharmaceuticals to financials services, high-end healthcare, oil and gas exploration and financial services. Once online, the systems will offer 50 times the performance of the current 20 petaflop-capable machine currently on the Top500 list of the world’s fastest supercomputers. They will run larger and more complex applications and do so in a power envelope of 20 to 30 megawatts.
What these machines will look like and what technologies they will hold are still being determined. But work on them is going on across the globe, with the stakes rising to an international level. The Chinese government has ramped up investment in exascale computing, with at least three projects underway and plans for a prototype called the Tianhe-3 to be ready by next year. At the same time, China is growing its skilled workforce, promising to be strong competitor on the international scene for years. Governments and vendors in Europe and Japan also are developing exascale systems, with Fujitsu building the Post-K supercomputer that will be based on the ARM architecture rather than the company’s own SPARC64 fx processors. For its part, the ECP is a seven-year program that is designed to run through 2023 and will include an initial exascale system that will be built on what Messina called “advanced architecture” and will roll out in 2021. It’s still unclear what that advanced technology will entail, though there are a number of possibilities. The project also plans for other exascale systems to delivered in 2022 and deployed in 2023.
The ECP already is working with several major vendors on the exascale efforts, including Intel, Nvidia and IBM. At the same time, Messina said the group not far away from finalizing contracts for six other projects that will bring more vendors into the mix, though he declined to say what those projects will entail or which vendors are involved.
The United States and other countries have a lot riding on their exascale projects. The country that can lead the exascale race will have an edge in everything from business and research to innovation, and there is concern among some in the United States about possible budget cuts to the Department of Energy under a Trump administration. Noting China’s efforts, Messina said that “for the United States, it’s important for us to be right there, if not beyond.”
Throughout his hour-long talk, Messina touched on a broad range of issues being addressed by the ECP that will all play into development systems that strike the right balance among everything from compute, storage and memory to networking, software and programming. He noted several times the need for balance in the systems that will have massive numbers of nodes and enable greater performance and parallelism while keeping down power consumption. There has to be the right balance between bandwidth and memory, and software applications will have to be built in a way that enable them to not only address some of the most complex issues in areas like healthcare, science and meteorology, but also be able to run on a range of systems regardless of the underlying architecture. Pointing to a chart that outlines the various programming models and runtimes, tools like debuggers, compilers and profilers, and various libraries and frameworks, Messina said “it would be nice to be able to reduce this Tower of Babel of software.”
A key for the ECP is the partnerships with vendors on both the hardware and software side. The partnerships are a way of ensuring that what is created from the exascale R&D is applicable to commercial and academic entities and that U.S. vendors continue to be engaged in developing high-end systems. One way of doing that is to ensure that what they develop for the exascale project can add to their bottom lines, Messina said.
“We’re trying to keep a lot of American companies interested in high-performance computing,” he said, echoing back to the importance of keeping the country competitive with China and others. “We want [what the companies develop] to be part of your product lines, not something built special just for us.”
At the same time, the ECP has created a council of 18 large companies to advise researchers about what they need in future software, and is collaborating with software vendors to ensure that applications built for the project can meet the needs of such end users. Among the software projects the ECP is working on that Messina noted are CANDLE (CANcer Distributed Learning Environment) to accelerate research into combatting top challenges facing the National Cancer Institute and GAMESS (General Atomic and Molecular Structure System) for exascale computing in chemistry and materials.
As we have talked about at The Next Platform, China is pushing ahead with three projects aimed at delivering exascale systems to the market, with a prototype – dubbed the Tianhe-3 – being prepped for next year. For its part, the National Strategic Computing Initiative in the United States is readying two exascale-capable systems for delivery in in 2021.
The demand for exascale capabilities within the HPC community is growing almost as fast as the amount of data being accumulated in the scientific and commercial arenas. Exascale systems will drive the latest applications in everything from meteorological research and high-end healthcare to oil and gas exploration, to national security and to emerging workloads like artificial intelligence. The work to design and develop these systems is ongoing, and the architecture for these systems – and what technologies they will use – is still continuing to evolve. A key driver is to find ways to get as much performance out of the exascale systems as possible while keeping a lid on the power consumption. Simply scaling current architectures won’t work.