NASA Supercomputing Strategy Takes the Road Less Traveled
August 31, 2017 Nicole Hemsoth
For a large institution playing at the leadership-class supercomputing level, NASA tends to do things a little differently than its national lab and academic peers.
One of the most striking differences between how the space agency views its supercomputing future can be found at the facilities level. Instead of building massive brick and mortar datacenters within a new or existing complex, NASA has taken the modular route, beginning with its Electra supercomputer and in the near future, with a 30 Megawatt-capable new modular installation that can house about a million compute cores.
“What we found is that the modular approach lets us be more efficient in the cooling of processors. Our standard facility is pretty efficient now; we use about one-third of the power to cool the machine that we do to run it,” says Dr. Bill Thigpen, NASA’s long-time Advanced Computing Branch Chief who has overseen traditional datacenter deployments for NASA systems as well as the modular-based Pleiades machine. “If we use a megawatt, we use 300 kW to cool it. With modular, instead of 33%, we’re using 2.5% of the power to cool the system.”
That is a striking figure as far as datacenter power and cooling goes—something that becomes increasingly important as the size of systems continues to grow. Even more surprising is Thigpen’s assertion of how much less the module cost is compared to a traditional facility. “When we look at the current module we just deployed for Electra, the cost of the facility is 10% of the cost of the compute that goes into the facility.”
It is never easy to get an idea of what the calculation of compute to facility looks like for traditional datacenters but it is fair to say that the 10% number is indeed striking. So if the numbers are shaping up like this, why aren’t more HPC centers looking to modular approaches? And further, why build modular if you are not going to move the datacenter (aside from the apparent cost element in NASA’s experience).
Thigpen says there has been a lot of technology evolution in the modular space and we will start to see more of this happening in the near future, especially as the NASA modular story and others play out and show credible numbers over time. He says when he approached NASA with the concept for Electra initially there was skepticism over his 6% cooling efficiency target. They are now at 2.5% because of this approach, he says. To be fair, the reason the system is called at a certain petaflop figure is because of the modular limitations on what can be cooled. This means NASA supercomputing teams will be integrating more datacenter cooling and power efficiency trick to cram more compute into the modules.
“People think they can’t build large systems with modules,” he says. “You absolutely can, but you have to let [the builders] know what your goals are up front. The first Electra modules were built by a company in Italy, the next incarnation from a company in Maryland, but these are all non HPC-specific vendors doing the work—Thigpen took HPE to task to find the module builders that could build to suit their machines that are extending the current Electra machine, which is 32,000 cores now.
The other key thing about the modular approach is that the center can pick the best thing for the time and easily swap it in. the expansion to Electra for instance uses HPE’s eCells which bring water into the compute but that can still use outside air. That module is different in how it cools but the facility does not have to be designed around it. If a new technology is going in, the module can be designed to suit it.
There is one California specific reason for why this all works. Because of the climate in the Bay Area, Thigpen says they are able to run their modules with outside air without any traditional chillers. “We are either running straight off the air temperature or we are running it through a water filter to turn the temperature down. What that means is that we save enormous power—for cooling, we save around 92.4% of the power that is traditionally used for cooling we save; we’re only using 7.6% of what we would normally use. In using water, we save 99.4% percent of that water that would normally be evaporated a traditional center with cooling towers and chillers.”
We will see the power story play out in the near future with the extension of the Electra center. The first module of the Electra system held around 1.2 petabytes of Intel Broadwell based compute. The expansion includes another module that can handle twice as many nodes. In about a week or two there will be over 2,000 Skylake 40-core parts for a total of around 46,000 cores. This new incarnation of Electra will have a peak theoretical performance of 4.78 petaflops. For NASA end users, however, this is still not going to be enough compute—which leads us to another way the center is diverging from other large sites.
Inside those datacenters, there is something else a little different. While many large centers are still vying for top placement on the bi-annual Top 500 list of the most powerful supercomputers, NASA wants to keep user focused. That means pushing performance—but performance for a wide variety of applications. That very diversity in workloads means it is better for the center to have a standard, more homogenous architecture that runs across its systems (which happen to share a file system, around 40 petabytes of storage and nearly half an exabyte in archival storage.
While NASA is not the only supercomputing site to eschew GPU and Xeon Phi-based acceleration (many mission-driven national labs are CPU only to suit decades-old code that is the subject of thousands of development hours), its roadmap does not feature these at the heart of future systems. “We have found that if we spend a million dollars on processors, we get more science and engineering out of a standard Xeon than anything else. Accelerators are good for Linpack and a subset of codes, but most of our codes run better with the memory, interconnect, and interfaces that we get with standard Xeons,” says Thigpen. He says that the differences between the manycore accelerator architectures and what is coming with the Skylake nodes that will find a home at NASA in the near future is not so big—and that difference will be erased with future generation chips anyway.
It should be noted that Thigpen is also the deputy project manager for NASA’s High-End Computing Capability Project, which is focused far less on garnering peak benchmark performance than it is serving a wide range of scientific computing users on NASA’s four current supercomputers, the largest of which is Pleiades—a machine that came online way back in 2008 and is getting quite old by supercomputer standards. “Our users need an order of magnitude more compute than what we can get anywhere in the U.S.” he says, even if few codes take over all of Pleiades. The goal with the million core effort is to be able to run those massive jobs at the same time on the big, efficient machine.
Over time the Pleiades system has had some modernizing additions, but the SGI machine is still largely Sandy Bridge based. Teams at NASA have not completely turned away from accelerators; on Pleiades there are two racks of older generation Nvidia K40 GPUs matched with Sandy Bridge and a single rack of the first generation Xeon Phi. These are only for some codes and Thigpen says there is not much interest from the users to drive them toward an accelerator focus with any new machines.
The new system that will appear in the future 30MW site will likely not be using the entire power supply they are building for—probably closer to 10-15MW, Thigpen says. The vendor has not yet been chosen for that site, we will talk to him again when there are more details about how to jam more compute into a much smaller power profile. Until then, the refreshed Pleiades and Electra systems will keep their spot in the upper tiers of the Top 500—and we will wait to see what kind of green computing results emerge from Electra in particular in November when the Top 500 and Green 500 lists are made public.