Bringing 330 Petaflops Of Supercomputing To Bear On The Outbreak

IBM, Amazon, Microsoft, and Google are teaming with the White House, the US Department of Energy, and other federal agencies to bring a massive amount of supercomputing power and public cloud resources to scientists, engineers and researchers who are working to address the novel coronavirus global pandemic that is expected to bear down hard on the United States in the coming weeks.

Through the Covid-19 High Performance Computing Consortium announced over the weekend, the companies and organizations are making available more than 330 petaflops of performance over 16 systems that hold an aggregate of more than 775,000 CPU cores and 34,000 GPUs to researchers to help them better understand the virus, treatments that can be used and potential vaccines and cures. And because the current economic crisis is tied to the pandemic, anything that can be done to solve the coronavirus outbreak will certainly slow the cratering of the economy and soften the recession that’s coming if it’s not already here.

The move to pool all this supercomputing power comes as the coronavirus continues to spread around the globe. Estimates have put the number of confirmed cases around the world at almost 337,000 resulting in more than 14,700 deaths. In the United States, the numbers are just over 39,000 cases and 455 deaths, with the brunt of the pandemic expected to hit over the next several weeks.

“How can supercomputers help us fight this virus? These high-performance computing systems allow researchers to run very large numbers of calculations in epidemiology, bioinformatics, and molecular modeling,” Dario Gil, director of IBM Research, wrote in a blog post. “These experiments would take years to complete if worked by hand, or months if handled on slower, traditional computing platforms. By pooling the supercomputing capacity under a consortium of partners … we can offer extraordinary supercomputing power to scientists, medical researchers and government agencies as they respond to and mitigate this global emergency.”

Included in the consortium are not only the tech companies but the Argonne, Lawrence Livermore, Los Alamos, Sandia and Oak Ridge national laboratories, the Massachusetts Institute of Technology, Rensselaer Polytechnic Institute, the National Science Foundation, and NASA.

Lining Up The Compute Power

Supercomputers already have been enlisted in the fight against the virus. Using the massive Summit system at Oak Ridge, scientists this month via simulations ran through how 8,000 molecules would react to the coronavirus and were able to isolate 77 compounds that may be able to be used to stop it from infecting host cells, a crucial step toward finding a vaccine. Summit, first on the Top500 listdelivers more than 200 petaflops of performance. Researchers also have used the Tianhe-1 supercomputer in China and supercomputers in Germany for everything from diagnoses to research. Summit is included in the systems available to the consortium.

The new Covid-19 consortium will bring to bear compute power from more than a dozen systems. Lawrence Livermore is opening up its 23 petaflops Lassen supercomputer (788 compute nodes, Power9 chips and V100 GPUs), Quartz (3.2 petaflops, 3,004 nodes and Intel Xeon E-5 “Broadwell” chips), Pascal (900 teraflops, 163 nodes, Xeon-E5 Broadwell CPUs and Nvidia Pascal P100 GPUs), Ray (1 petaflops, 54 nodes, Power8 CPUs and Pascal P100 GPUs), Surface (158 nodes, 506 teraflops, Xeon E5 “Sandy Bridge” chips and Nvidia Kepler K40m GPUs) and Syrah (108 teraflops, 316 nodes and Xeon E5 Sandy Bridge chips).

Los Alamos systems are Grizzly (1.8 petaflops, 1,490 node and Xeon E5 Broadwell CPUs), Snow (445 teraflops, 368 nodes and Xeon E5 Broadwell CPUs) and Badger (790 teraflops, 660 nodes and Xeon E5 Broadwell chips), while Sandia will make its Solo supercomputer (460 teraflops, 374 nodes and Xeon E5 Broadwell chips) available.

The consortium also will have access to five supercomputers supported by the NSF: Frontera and Stampede 2, both operated by the Texas Advanced Computing Center (TACC). Stampede 2 provides almost 20 petaflops of performance designed for scientific, engineering, research and educational workloads. It uses 4,200 Intel Knights Landing nodes and Xeon “Skylake” chips. Frontera is aimed at simulation workloads, data analytics and emerging applications such as artificial intelligence (AI) and deep learning. It offers a peak performance of 4.8 petaflops and is powered by “Cascade Lake” Xeon SP Platinum chips.

TACC’s Stampede supercomputer

Comet is a 2.76 petaflops supercomputer at the San Diego Supercomputer Center powered by Xeon E5 chips and Nvidia K80 and P100 GPUs, Bridges is a mix of Xeon E5 and E7 chips and Tesla K80, Tesla P100 and Volta V100 GPUs operated by the Pittsburgh Supercomputing Center, and Jetstream, run at Indiana University’s Pervasive Technology Institute powered by Xeon E5 Haswell chips, which uses elements of a commercial cloud computing.

NASA is making its high-performance computing (HPC) resources available to researchers, MIT is offering its Supercloud, a 7 petaflops cluster powered by Intel chips and Volta GPUs, and Satori, a 2 petaflops system using Power9 CPUs and Volta GPUs. The system is oriented toward AI workloads. RPI’s Artificial Intelligence Multiprocessing Optimized System (AiMOS), an 8 petaflops Power9/Volta supercomputer, is being made available to the consortium to explore new AI applications.

Google Cloud, Microsoft Azure and Amazon Web Services (AW) are making their infrastructure and cloud services available to researchers. Microsoft will provide grants to researchers via its AI for Health program and the program’s data scientists will be available to collaborate on consortium projects. IBM’s Research WSC 56-node cluster, powered by Power9 chips and V100 GPUs, also will be available. In addition, IBM will help evaluate proposals that come in from researchers.

Carving Up The Work

Consortium members expect a range of projects to be run on the supercomputers, from studies of the molecular structure of Severe Acute Respiratory Syndrome (SARS), another coronavirus that started in China in 2002 and quickly spread to other parts of the globe, to the makeup of Covid-19, how it’s spreading and how to stop it. Such work around bioinformatics, epidemiology, and molecular modeling requires a huge amount of computational capacity, which is what the consortium is offering.

Scientists and medical researchers who are looking to access the consortium’s compute capabilities can submit a two-page description of the proposal on the NSF’s Extreme Science and Engineering Discovery Environment (Xsede) website. The proposal shouldn’t include proprietary information – the consortium expects teams that get access to resources will not only publish their results but produce an ongoing blog during the research process.

The proposal should include scientific and technical goals, an estimate of how much compute resources will be needed, whether collaboration or additional support from consortium members will be needed, and a summary of the team’s qualifications and readiness for running the project.

Once a proposal is submitted, it will be reviewed by the consortium’s steering committee on such metrics as potential impact, computational feasibility, resource requirements and timeline. A panel of scientists and computing researchers, which will work with the proposing teams to evaluate the public health benefits of the project. Speed is of the essence; an emphasis will be placed on projects that can ensure rapid results, the organization said.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

3 Comments

  1. “companies and organizations are making available more than 330 petaflops of performance”
    Every nanoFLOP is important and such an initiative of big names deserve to be applauded.

    However there are approx. 160 million ps4/pro & xbox/one consoles in the world who cannot run folding@home application in order to fight deadly COVID-19 virus! A few days ago the total aggregated folding@home CPU raw power has passed 470 petaFLOPS (https://twitter.com/drGregBowman/status/1241037866215657472) so do not underestimate the power of small CPU.

    SONY and Microsoft executive directors should immediately allow this type of distributed applications to run on their consoles!

    • This has already been considered for Folding@Home, but the issue is not primarily about raw hardware power, but Precision (Gaming rigs use single precision floating point for graphics shaders); API ( game rigs do not support HPC API (e.g. OpenCL)); Bandwidth (pull schedulers are more intensive than push (e.g. MPI) schedulers and create hot nodes (this was the issue for Folding@Home)); Latency (calculations must be duplicated to avoid timeouts.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.