Plans for First Exascale Supercomputer in U.S. Released

Nicole Hemsoth Prickett Nicole Hemsoth Prickett

Published wed 27 Sep 2017 // 15:03 UTC

This morning a presentation filtered from the Department of Energy’s Office of Science showing the roadmap to exascale with a 2021 machine at Argonne National Lab.

This is the Aurora machine, which had an uncertain future this year when its budgetary and other details were thrown into question. We understood the deal was being restructured and indeed it has been. The system was originally slated to appear in 2018 with 180 petaflops of peak performance at double precision floating point. Now it is 1,000 petaflops, an exascale capable machine, and will be delivered in 2021—right on target with the projected revised plans for exascale released earlier this year.

NEXTPLATFORM AD

When the system’s future was up for debate, some suspected an shift in architectural direction to meet shifting workload needs in HPC—or, as more pessimistic members of the HPC community speculated, to get away from unanticipated ultra-high costs with key vendors. HPE’s The Machine was touted as being a potential replacement for the Cray and Intel partnership but it appears the same vendors on the hook to deliver the system.

Intel and Cray, as we have written before, were originally set to deliver the Cray “Shasta” system with the forthcoming Knights Hill processor in 2018. While we expect Knights Hill to emerge on schedule (we got an in-depth on the status of Knights Mill, a related chip, at Hot Chips in August, but Intel has not said much about Knights Hill), this architecture will very likely be some post-KNH chip. This means high computing potential, but also high power consumption–something important to stay within a 20 megawatt to 40 megawatt boundary.

Our several sources today confirm that this machine is not based on Knights Hill and that, whatever it uses for compute represents a big step in performance. It also comes with a big risk. Considering the DoE is going to invest in a completely new, unseen architecture at this scale to be the first exascale system in the United States is its own risk. The secondary risk is bumping ahead the timeline of exascale delivery from 2023 to 2021.

This future Argonne exascale system very likely does not mean a tweak to the Knights family that puts a more modest number of X86 cores – say, one or two dozen – with super-wide vectors and maybe tons of MCDRAM3 memory and maybe 400 Gb/sec Omni-Path 3 interconnects on the die or in the package. We got the distinct impression from Intel several years ago that it was interested in getting the memory capacity and memory bandwidth back in line with the compute capacity and network connectivity compared to the “Knights Landing” Xeon Phi, which has 72 cores, a pair of 512-bit vectors in each core, a modest amount of MCDRAM1 memory, and on-package 100 Gb/sec Omni-Path 1 interconnects.

NEXTPLATFORM AD

Interestingly, the innovation with the Argonne exascale machine may come more in the vector unit, not the X86 core. With a mesh architecture, Intel could even go so far as break the CPU core free from the vector units, creating what comes to an offload model on the die — one that is insulated from the compiler and doesn’t require something like Nvidia’s CUDA development framework to operate.

One other tidbit we gleaned from casual conversations is that the forthcoming architecture is less like a novel approach (quantum, neuromorphic), not accelerated (as in with an offload model from a discrete CPU to a discrete GPU or FPGA), but of a “reimagined” way of designing chips. We are not sure if that means a familiar X86 ISA, but would assume it would be a requirement for productivity. We have also heard that there is an interest in reducing the node count on this future Argonne machine, making it look less like IBM’s BlueGene/Q, which had a large number of modestly powered nodes. The original Aurora machine was set to have around 50,000 nodes to reach its 180 petaflops, which suggests an average of 3.6 teraflops per node. That was not much more powerful than the current top-end Xeon Phi 7290, which does 3.46 teraflops at double precision on 72 cores running at 1.5 GHz and a pair of 512-bit vectors per core. Say that Knights Hill had double the vectors in half the cores but 50,000 sockets to reach that 180 petaflops in the original Aurora, and added in mixed precision down to 16-bit vectors just for fun. It would take 250,000 nodes to kiss exascale with this chip, and breaking the chip apart more would increase the node count and reduce the performance per node unless the vector width was increased radically.

Intel could be going the other way, creating a massive compute complex, perhaps with 3D stacking of compute and memory, with a huge number of cores and very wide vectors in each unit, and many Omni-Path links into this gigantic chip. This would be the only way to reduce the node count, as we are hearing it is trying to do.

We will follow up as we are able to secure interviews and provide much more detail.

hpc compute aurora

Plans for First Exascale Supercomputer in U.S. Released

Broadcom May Become The Biggest Counterbalance To Nvidia

Ayar Labs Gets $500 Million To Ramp Photonics Into 2028 AI Systems

With Cisco Outshift, Agentic AI Is Teed Up For the Internet Of Cognition

Nvidia Sees The Light On Silicon Photonics And Maybe Optical Switching

AI Servers Finally Dominate Dell’s Systems Business

VAST Data: What Controls The Data Is More Important Than What Stores It

So Far, Nobody Turns Tokens Into Money Like Nvidia

SambaNova Pits Its Engineering Against Nvidia For Agentic AI

Some More Game Theory, This Time On The AMD-Meta Platforms Deal

CPU-Only Compute Still Matters To A Lot Of HPC Centers

AMD Says “Helios” Racks And MI400 Series GPUs On Track For 2H 2026

Taalas Etches AI Models Onto Transistors To Rocket Boost Inference

Some Game Theory On That Nvidia-Meta Platforms Partnership

AI Eats The World, And Most Of Its Flash Storage

The Current AI Networking Wave Will Be A Tsunami Of Money By 2027

The Memory Crunch Pinches Cisco’s Profits

Only A Few AI Platforms Can Survive

Cisco Doubles Up The Switch Bandwidth To Take On AI Scale Out And Eventually Scale Up

The Greatest AI Show On Earth

Datacenter Spending Forecast Revised Upwards – Yet Again

The Twin Engine Strategy That Propels AWS Is Working Well

With GenAI Turbochargers, Google Is Shifting Its Cloud Into A Higher Gear

AMD Finally Makes More Money On GPUs Than CPUs In A Quarter

Dassault And Nvidia Bring Industrial World Models To Physical AI

TACC Explores Mixed Precision And FP64 Emulation For HPC With Horizon

Robotics Will Break AI infrastructure: Here’s What Comes Next

Oracle’s Financing Primes The OpenAI Pump

Gartner Takes Another Stab At Forecasting AI Spending

Microsoft Is More Dependent On OpenAI Than The Converse