Stage Set for Richer Machine Learning-Infused HPC

A research collaboration between the National Cancer Institute (NCI) and Lawrence Livermore National Laboratory (LLNL) is demonstrating the value of using machine learning to overcome daunting computational challenges.

Although the specific goal of NCI-LLNL work is to advance the understanding of the biomolecular mechanisms that underly some of the most aggressive human cancers, the computational approach that was employed to do this has more far-reaching application.

At least that’s the claim of Fred Streitz, LLNL’s Chief Computational Scientist and HPC Innovation Center Director, who led the project at the national lab. According to Streitz, the collaboration was very much in the interest of both organizations: NCI reaped the direct benefit of advancing their cancer research work, while LLNL got the opportunity to explore new ways of using machine learning to cut intractable HPC problems down to size.

“It turns out that the workflows that are necessary to understand some of these biology problems are different than what we are currently doing,” Streitz told The Next Platform. “But we think it’s part of the future, so this was a way of training ourselves to solve a particular class of problem, and in doing so help the National Cancer Institute.”

The project also gave LLNL the chance to fully exercise Sierra, the lab’s newest supercomputer and currently the second most powerful system on the planet. While the system often lives in the shadow of Summit, its more powerful sibling at Oak Ridge National Lab, its nearly identical GPU-accelerated architecture makes it extremely attractive for running machine learning codes at scale. Thanks to the specialized Tensor Cores in the system’s 17,280 Nvidia V100 GPUs, Sierra can deliver something in the neighborhood of two peak exaops of machine learning performance.

Machine learning aside, the NCI research problem has all the earmarks of a fairly typical HPC application in molecular biology. In this particular case, scientists are trying to figure out how RAS proteins on cell membranes go rogue and start inducing uncontrolled cell growth, i.e., cancer. RAS proteins are ubiquitous in the animal kingdom and are used to initiate a sequence of protein bindings, which eventually results in inducing the cell nucleus to divide. That’s all well and good for normal tissue growth, since in most cases, the RAS protein shuts off after it has performed its cell division trick.

But if a genetic mutation is present where the protein gets stuck in the on position, that can be deadly, leading to some of the more aggressive types of cancer. About 93 percent of pancreatic cancer is associated with this particular mutation, along with almost half of colorectal cancers and about a third of lung cancers. “These are death-sentence cancers,” says Streitz.

Although the role of the RAS protein in cancer growth has been known for a couple of decades, the exact mechanism of how this works remains a mystery. Thus, treatments to correct or suppress this mechanism could not be developed. But scientists did have enough data to build a model of the RAS protein on the membrane, which is where LLNL and its supercomputing expertise comes in.

The problem, says Streitz, is that you need a good-sized high-resolution model to discern what’s going on at the molecular level. While it’s possible to build a micron-scale model to observe the behavior of the RAS protein on the membrane, a much more fine-grained model is needed to reveal the molecular mechanisms involved. “I can’t look at microns of space with atomistic resolution, not even on Sierra, not even on the next-generation machine,” he says. “We’re orders of magnitude away from being able to do that.”

What Streitz and his team came up with was a multiscale approach, infused with machine learning. The starting point was a micron-scale model that captured the movement of the RAS protein on the cell membrane. They then employed machine learning to pick out the most interesting regions of the membrane where the RAS action seemed to be occurring. Those regions were modeled with heavy-duty molecular dynamic simulations to reveal the atomic-level behavior. By selectively removing the uninteresting regions of the membrane, the scope of the molecular dynamics simulations was dramatically reduced, making it possible to support on a large machine, like Sierra. The whole workflow is automated, says Streitz, with the machine learning piece acting as the intelligent glue between the two models.

The application needed only a few hundred Sierra nodes to run the micron-scale model, but then took the rest of the machine to run the much more intense molecular dynamics simulations being spawned by the machine learning process. Because Sierra has over 4,000 nodes, each equipped with a couple of Power9 CPUS and four V100 GPUs, the application is able to simultaneously run 14,000 molecular dynamics simulations, each of which encompasses several million particles.

Most of the training for the machine learning model was accomplished separately, in advance of the big molecular dynamics simulations. The inferencing work, in this case, ranking the interesting membrane regions, is performed at runtime. One noteworthy outcome of this rather complex workflow was that the V100 GPUs were used for both the inference processing and the molecular dynamics simulations, while the micron-scale model was running exclusively on the Power9 CPUs – all of which could be executing on the same node. “So the machine was sliced up, not by node, but core by core,” explains Streitz.

Although all of this work was performed in the name of biology, the multiscale approach, marrying coarse-grained with fine-grain simulations, has broad applicability in other areas, according to Streitz. Material science, computational chemistry, turbulence modeling – essentially any problem where it makes sense to couple coarse-grained and fine-grained models – are candidates for this approach.

Likewise, using machine learning to couple two different models can also be generalized across a wide array of engineering and scientific applications. It’s especially useful for intelligently guiding a series of simulations in order to save time or, as in this case, to reduce the problem size into something that fits into the machine at hand. “At the end of the day, the fastest simulation you’re ever going to run is the one you don’t have to run,” says Streitz.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.