Climate Research Pulls Deep Learning Onto Traditional Supercomputers
May 18, 2016 Nicole Hemsoth
Over the last year, stories pointing to a bright future for deep neural networks and deep learning in general have proliferated. However, most of what we have seen has been centered on the use of deep learning to power consumer services. Speech and image recognition, video analysis, and other features have spun from deep learning developments, but from the mainstream view, it would seem that scientific computing use cases are still limited.
Deep neural networks present an entirely different way of thinking about a problem set and the data that feeds it. While there are established approaches for images and speech patterns both in terms of training and inference, research areas that could benefit are still lagging somewhat behind. Further, as we have described here in the context of Baidu, Yahoo, and others, training datasets aren’t the only challenge—there are also unique hardware requirements. Separate training clusters, almost always powered by GPUs, are needed for deep learning at scale, and having tailored systems for the inference side of deep learning workflows is also helpful. Not all research centers have resources that are divided in this way, but for some centers, it is still possible to tailor around existing algorithmic and hardware constraints and achieve notable results.
This is happening already in the area of climate research where scattered efforts are tweaking existing deep learning frameworks to detect patterns, including forthcoming dangerous weather patterns. Instead of doing this on specialized training and execution clusters, some are using high performance computing systems, just as they would for their existing HPC simulation and modeling tasks.
Most recently, the blend of existing HPC systems and deep neural networks allowed climate researchers at the National Energy Research Computing Center (NERSC) to achieve quite high classification accuracy of potentially threatening climate events. The team was able to achieve between 80% to 90% which represents “the first time that deep convolutional neural networks have been applied to tackle climate pattern recognition problems.” According to the NERSC team, “this successful application could be a precursor for tackling a broad class of pattern detection problems in climate science.”
What is most interesting about this effort in terms of how deep learning might be applied in traditional supercomputing application areas is that a standard supercomputer was used for both the training and inference. Model training and testing all took place on the Cray XC30 “Edison” supercomputer as well as the new “Cori” supercomputer, both of which are at NERSC. The team says they mainly used single-node CPU backend of NEON, the open source library for deep learning from Nervana Systems. The optimization work to tweak the CNN was done on the 32 cores of the Cori Haswell-based system. No GPUs, apparently, were harmed in the making of these climate studies—a strange circumstance, since nearly every single one of the deep learning use cases at scale we have described here at The Next Platform uses a CPU-only approach to model training.
“Deep neural networks learn high-level representations from data directly, therefore protentially avoiding traditional subjective thresholding based criteria of climate variables for event detection.”
Also of interest, the teams used conventional convolutional neural network approaches but adapted the models and training based on the relative size and other factors of the training set. In short, adapting existing CNN approaches to climate science took some retailoring, both on the algorithmic and hardware capabilities side, but it does show a path forward for climate research, and of course, other areas in scientific computing where classification, optimization, and understanding across complex and changing datasets is required. Unlike things like ImageNet where the training set are pre-labeled images, the NERSC team’s training set consisted of several continuous spatial variables (including pressure, temperature, precipitation, etc.) that are stacked together into “image patches”. The CNN is built around this to classify key extreme weather events.
As the team describes, one of the biggest challenges for forecasting potentially dangerous weather events lies with human variance—something CNNs work around. “Existing extreme climate event detection methods all build on human expertise in defining relevant events based on evaluation of relevant spatial and temporal variables on hard and subjective thresholds….However, there is no universally accepted set of criteria for what defines a tropical cyclone [or other extreme event].”
According to the NERSC team, this study, which focused on using CNNs for classifying cyclones, atmospheric rivers, and weather fronts individually, might be extended in the future to include training a single neural network for all three events.