One reason we’re watching Lawrence Livermore National Lab closely is because they are at the forefront when it comes to blending emerging HPC, deep learning, and edge technologies for applications that are representative of what’s next. For instance, computational work done in physics (just one example area) has an impact on autonomous technologies, which then has an influence on what hardware thrives.
And one reason why we also pay attention to Brian Spears is because he’s directly at the point where all of this convergences. Spears straddles computational worlds as a physicist at LLNL. He leads the cognitive simulations effort at the lab, which seeks to blend traditional high performance computing with deep learning for applied science. He is also focused on what’s happening at the edge (scientific instruments, sensors, remote facilities, etc). His main research revolves around the National Ignition Facility, which houses the largest laser in the world. Here, he weaves between using AI, traditional modeling and simulation, and ultra-fast, lower power compute at the edge.
As if that was not interesting enough, he is also helping the lab build out its future systems that bring that blend of three computational worlds into balance. Spears has been instrumental in evaluating the system requirements of this mesh and in understanding what scientific applications need to bridge the gaps between large-scale model training and inference, compute and I/O intensive HPC simulation, and low-latency edge computing integration.
In short, there are few better people to talk to about the shape of future HPC systems that have AI built into their core (going beyond GPUs for training) and where high performance, low-latency inference on clusters and at the edge fits.
“We are asking, what is the piece of hardware that can connect with our existing GPU-dense HPC platforms and have the low latency and response to data volumes needed in these large-scale facilities,” Spears says. In the augmenting of traditional HPC there are issues in both training and inference. “First is inference though because that’s where the workload needs to be most responsive. Our existing HPC might be doing double-precision work but at some point it might want something from inferred data.”
It’s not just about device capability, however. He says that of the several inference devices they’ve evaluated it seems the vendor focused on architecture or interface. “Some have been ready to go and partner with a solid interface, while others have felt much closer to writing assembly. The hardware is interesting but as these vendors evolve some folks end up very hardware-forward, others interface oriented.”
The main focus now is the Cerebras CS-1, although we understand (separately) that there are others that are not yet public. Spears and team are currently about to connect the CS-1 to the Lassen supercomputer to run models in prototype software to get a handle on integration. They’ve also developed a system of “mini-applications” that reflect the demands for memory, memory bandwidth, and other traits.
The question with Lassen, he says, is whether they can do high precision compute across the whole machine then look at a trained model to replace a sub-piece of a physics package while the computation runs and manage interfacing between a physics code running on Lassen’s CPU and GPUs while also making inference calls to the Cerebras CS-1.
“Because we have some of the largest supercomputers in the world we’re pushing our AI systems to build some of the largest models in the world, which means we need to train and infer on enormous models, we we’re interested in providers who can keep on chip or at least in memory, very large models so we can train at the scales we’re interested in.”
Spears says they have been training massive models that can span the entirety of Sierra (the #2 supercomputer in the world), scale across 17,000 GPUs. What’s most interesting is whether they can replace training at that scale with a purpose-built AI accelerator. For now, he says, “The first thing we have is the Cerebras CS-1 and it pairs well bc of the enormous size of their wafer-scale chip. We have access both to high bandwidth memory and a large footprint to put down a giant model.”
2028-2029 might sound like a long way off, but to Spears, it’s right around the corner, especially as they move toward evaluation to prototype to procurements. “We’re still in the exploration phase. Currently, we have a new system that will couple to Lassen and a few other evaluations in contract. We think we’ll have 2-3 accelerators coupled to a handful of machines over the next few years and we’ll use that to more fully develop what requirements we want to code into our mini apps. We’ll be running experiments on the floor by 2021 with a couple platforms running, then we’ll start making decisions about what we want to run next.”
Indeed, 2030 sounds like the remote future, but remember the arduous cycles of HPC procurements. What is worth wondering is how much innovation at the chip level is left for host systems, limited as they are for that lofty exascale future, and if something could come along to truly revolutionize the integration of these devices into floating-point-centric machines. It doesn’t look like quantum accelerators, even by then, so perhaps AI accelerators that can swiftly dispatch on both training and inference are the real shining hope for (even) more capable scientific computing.