The Texas Advanced Computing Center (TACC) will soon be home to a top-tier supercomputer with the 2019 arrival of the “Frontera” system.
As we detailed last week, the National Science Foundation (NSF) put forward $60 million for the first portion of a system that can perform 2X to 3X the application performance of the Blue Waters supercomputer, a five-year-old system hosted at the University of Illinois that at the time of its deployment in 2013 was the world’s fastest system at an academic institution. TACC won the award and today shed light on how Frontera will pick up where Blue Waters left off.
While TACC is known for its investments in diverse architectures—from testbeds for cloud, FPGA, GPU, and ARM, to name a few—the Frontera machine in its first incarnation will be decidedly vanilla with Cascade Lake CPUs bring the machine into the 35 to 40 peak petaflops range, which could easily place it in the top five most powerful systems in the world.
The decision to go with standard CPUs on Frontera is not surprising given the machine’s purpose, which is to chew through a diverse array of scientific workloads for a broad user base. With that said, as TACC director, Dan Stanzione, tells The Next Platform, the processors of choice were made with high performance in mind.
The system will have around 8,000 nodes with the future “Cascade Lake” Xeon SP Platinum processors, specifically with the follow-on to the 28-core “Skylake” Xeon SP-8180. These are architecturally straightforward but will run every single science code with very little fear and can be live soon without code changes.
“The core counts will go up from Stampede2 some, the node count by quite a bit, and the memory bandwidth will also increase since we are going up another clock step on the DIMMs. The cache per core is about the same but with that higher clock rate—probably between 25 percent to 30 percent [for AVX-512 vector units, not headline clocks] we are making some decisions about balance and tradeoffs in terms of energy.” =
Stanzione says TACC made the decision to go with the Cascade Lake SKUs that have the higher clock rates and they expect most codes will run significantly faster. His team took a close look at other processor options, including the 7 nanometer AMD “Rome” Epyc “chips coming next year, which he says were a closer frontrunner in their decision-making process. “We took a look at AMD Epyc, both Naples and certainly Rome, but with the combination of price, schedules, and performance, we felt like Cascade Lake was the way to get the best value right now. Our codes were just a little better for the time we needed this system but Rome is a promising architecture and we expect it is going to be a very good chip,” Stanzione explained.
Also on the machine, although not part of the expected peak floating point count, GPUs, most likely Nvidia Volta, will tack and additional three to four petaflops of single-precision performance onto Frontera as TACC tests the waters for key machine learning and molecular dynamics workloads that will perform well with lower precision.
Things will get truly interesting with Phase 2 of the project when the possibility of a novel or accelerated design might appear. This second part of the system will need to deliver 10X the application throughput of the 2019 system in the 2023 to 2024 timeframe. If all goes well with the forthcoming Aurora 2021 machine at Argonne National Laboratory (Intel is still the prime contractor for the project but the architecture is still unannounced), that architecture will likely set the stage for TACC’s second stage with the arrival of a testbed system sometime in the 2021 timeframe.
Stanzione says the problems running on Blue Waters can run at least 3X faster and peak-wise are more than double where the Knights Landing and Skylake-based Stampede 2 system was. Taking this to a 10X point in the future with a fixed problem size still presents a challenge for the post 2023 timeframe. He says he is far less concerned whether TACC has an exascale machine than if they can continue scaling performance and besides, the peak petaflops might not be an important measure by any standards by that time given changes in HPC workloads and systems.
Stanzione is appropriately cautious about making too many predictions about what the post-2020 state of HPC workloads will look like and which architectures will best suit them. With the rapid pace of change in machine learning in HPC, for instance, the idea of setting aside a budget for a particular system several years in advance seems overly risky. And while the CPU only Frontera machine slated for next year takes few risks, it offers predictable performance TACC can budget for—and provides a foundation for most workload changes coming down the pike even if there are accelerators or other options that can make TACC science even faster.
The primary computing system will be provided by Dell EMC, PowerEdge C6420 servers to be precise, with those Intel processors inside and direct liquid cooling from CoolIT Systems. DataDirect Networks will contribute the primary storage system, and Mellanox Technologies will provide the high performance interconnect for the machine; we are guessing it has to be 200 Gb/sec Quantum InfiniBand switches. Nvidia, and the cloud providers Amazon, Google, and Microsoft will also have roles in the project.
The award marks the third time The University of Texas and the Texas Advanced Computing Center will have the largest university supercomputer. Running concurrently with Stampede2, TACC will operate the two largest systems for open science in the world simultaneously.