NNSA Stockpile of Nuclear Security Supercomputers Continues Evolution

trinity — In 2020, the first AT system, Trinity, will be nearing the end of its useful lifetime. Crossroads is a Trinity replacement, a tri-lab computing resource for existing simulation codes, and a larger resource for ever-increasing computing requirements to support the weapons program. Crossroads will provide a large portion of the AT system resources for the NNSA ASC tri-lab simulation community: Lawrence Livermore National Laboratory (LLNL), Los Alamos National Laboratory (LANL), and Sandia National Laboratories (SNL), during the FY21-25 timeframe.

Although it may be tempting to think that supercomputers developed to meet business and general scientific computing needs, a large part early of supercomputing evolution was fed by the need to both develop, test, and maintain nuclear weapon stockpiles.

Seventy years to the date after the first nuclear tests were conducted in New Mexico, supercomputers have evolved to the point that realistic simulations of nuclear detonations and weapons degradation can be accurately modeled with far greater detail and far more data than physical tests could reveal. This is not only safer, it allows for the application of several scenarios to be modeled at far lower cost.

Los Alamos National Laboratory has been at the center of much of this work and will soon be home to one of the most powerful supercomputers on the planet to aid in far more extensive modeling of many scenarios that could have an impact on nuclear weapons stockpiles, from sheer age and degradation to more catastrophic situations. The systems are part of a range of machines at both Los Alamos and other national labs (Sandia, Lawerence Livermore) and research sites (the Y-12 National Security Complex and Pantex, among others) run by the National Nuclear Security Administration (NNSA), which is part of the Department of Energy.

“The evolution of computers is directly tied to the evolution of nuclear weapons,” explains Alan Carr, historian at Los Alamos National Laboratory. In a recent overview of the history of nuclear testing, development, and stockpile stewardship to honor the 70^th anniversary at the Trinity test site in New Mexico, Carr described the challenges from the first test to now, where simulations of explosions, weapons degradation, and other nuclear security concerns are carried out on Los Alamos and other national lab supercomputers.

Just as the Trinity tower held the 100 pound aloft before the test seventy years ago, the forthcoming Trinity supercomputer will bear the weight of a new generation of nuclear security research. When the Trinity machine is delivered it will among the top five supercomputers on the planet with peak computing capability of 40 petaflops. To put that number in context, the current top system that handles NNSA workloads, Sequoia, is capable of just over 16 petaflops while Cielo, a supercomputer at Los Alamos, has clocked in at just over one petaflop.

“Highly accurate 3D computing is the holy grail of the Stockpile Stewardship Program’s supercomputing efforts. As the weapons age, 3D features tend to be introduced that require highly accurate 3D modeling to understand. This is a great challenge reminiscent of the one faced by the Manhattan Project. The challenge then was to build the first nuclear weapon that works—now our challenge is to understand how and why a weapon works well enough to confidently predict its performance without requiring an additional nuclear test.” – Charles F. McMillan, Director, Los Alamos National Laboratory.

During the initial phases of nuclear weapons development, the process of testing weapons was, as we are all aware, a completely physical process. This was unpredictable, difficult to obtain data from outside of direct observations, and hugely expensive in both resource and dollar terms. With the end of physical and underground testing, however, the simulation capabilities went up drastically, leading to the Advanced Simulation and Computing Initiative (ASCI) to create and build advanced modeling and simulation machines capable of providing the required resolution and complexity. In the list below, it is possible to see where the ASCI program kicks in, adding a massive rush of floating point capacity—an effort that was tied to innovations in cluster computing, which was a far more cost-effective and scalable way to consider large-scale computing.

On June 18, 2012, Sequoia was ranked number one on the TOP500 list, clocking in at 16.32 sustained petaflops. Sequoia currently ranks number three on the TOP500 list. The 96-rack Blue Gene/Q system is dedicated to ASC for stewardship of the nation’s nuclear weapon stockpile and will enable simulations that explore phenomena at a level of detail never before possible. As an initial delivery system for Sequoia Dawn was ranked 77th on the TOP500 list of the world's fastest computers. Dawn was an IBM BlueGene/P machine with a peak performance of 501 teraflops prior to it retirement August 30, 2013. Dawn was used to prepare applications for the Sequoia system, and was an important computational resource for the ASC program. — On June 18, 2012, Sequoia was ranked number one on the TOP500 list, clocking in at 16.32 sustained petaflops. The 96-rack Blue Gene/Q system is dedicated to ASC for stewardship of the nation’s nuclear weapon stockpile.

Although the delivery of the Cray Trinity supercomputer is still two years away, there are two other machines that are at least partially dedicated to NNSA activities, including the Cielo machine at Los Alamos and the Sequoia supercomputer at Lawrence Livermore National Lab.

Bill Archer, Trinity Supercomputer Program Director described the evolution of supercomputing at Los Alamos National Lab, which began in 1944 with IBM punchcard counting machines. This first step kicked off a first stage of evolution that has morphed into the large-scale scientific supercomputers that make the top of the Top 500 most powerful systems list.

John Morrison, former HPC director at LANL says the lab is entering the fourth era of computing technology. “The first was the serial era where we would run one code on one CPU. The next was the vector era that came with the Cray machines where we could break up the problem and run multiple parts of it at one time.

Cielo, currently ranked 40th in the world with a peak performance of 1.37 petaflops, was authorized in 2011 to conduct classified operations for NNSA. NNSA selected Cray Inc. to build Cielo in spring 2010. — Cielo’s peak performance is 1.37 petaflops (compare that again to the upcoming 40+ petaflops expected from Trinity), was authorized in 2011 to conduct classified operations for NNSA. NNSA selected Cray Inc. to build Cielo in spring 2010.

With the ASCI program around 1996, we started to go to commercial clusters where we could buy many processors, hook them together with networks, and parallelize the program across thousands of processors.” Now, in the forthcoming fourth era, it’s more like millions of processors—all the enable the kinds of high fidelity and resolution calculations the lab says are required to ensure the continued safety of the U.S. nuclear stockpiles.

As McMillan explains, “Nuclear weapons and computing go hand in hand. The evolution of computers is directly tied to the evolution of nuclear weapons. Simple computers were key to the design and development of the first nuclear bombs, like the one detonated during the Trinity test. Throughout the Cold War, evermore-powerful computers were designed and built specifically to aid the design and build cycle that led to today’s U.S. nuclear deterrent.”

For more on the evolution of computing and national nuclear security development, Los Alamos National Lab has put together a nice piece featuring some key voices from the Manhattan Project and subsequent teams who build the first (and future) supercomputers for NNSA sites.