Recognizing that the most important application of machine learning in HPC is to replace production numerical simulation models with machine learning approximations, one European organization is pushing these frontiers with innovative AI HPC research.
SURFsara is part of SURF, the collaborative ICT organization for higher education and research in the Netherlands. SURFsara is the national center for High Performance Computing & Data Services, and works together with the academic community (including researchers, educational institutions and academic medical centers) and with industry and SMEs.
The organization is working with scientists in four disparate areas of scientific research to demonstrate the transformative impact AI-based computing methods can have on the general HPC community. For example, recent research has shown that computationally expensive sampling techniques like Monte Carlo methods can be replaced with remarkably accurate and orders-of-magnitude faster artificial neural networks (ANNs)[i].
Caspar van Leeuwen (HPC consultant, SURFsara) observes, “There is big interest in AI in the HPC community. However, while scientists from this community are experts in their own domains, they do not necessarily have the machine learning knowledge or scaling capability to effectively leverage AI techniques for their research.” For example, SURFsara recently achieved deep learning training in less than 40 Minutes on ImageNet-1K along with best accuracy and Training Time on ImageNet-22K and Places-365 datasets[ii]. The results were obtained on large-scale systems incorporating the latest Intel Xeon Scalable processors such as the TACC (Texas Advanced Computing Center) Stampede2 and the BSC (Barcelona Supercomputing Center) MareNostrum4 supercomputers.
There is big interest in AI in the HPC community. However, while scientists from this community are experts in their own domains, they do not necessarily have the machine learning knowledge or scaling capability to effectively leverage AI techniques for their research – Caspar van Leeuwen, SURFsara HPC consultant.
Adoption of these new techniques does require that scientists embrace new computational methods that may eventually render obsolete portions of the codes their teams have invested significant time (even decades) working to develop and optimize. Along with many potential performance and accuracy benefits, an AI-based data driven machine learning approach can also help scientists focus more on the underlying science and to study divergent scales, like the planetary systems in star clusters.
In working on these four use cases, SURFsara has demonstrated a pragmatic, workable, and exemplary path to support scientists incorporate and use AI data-driven techniques over the long-term, not just as a one-shot to-be-discarded proof-of-concept.
Axel Berg (manager SURF Open Innovation Lab, SURFsara) notes, “In considering how to best support the principle investigator we also had to make the effort relevant to the scientific community”. Translated into practice, this meant the project needed to be more than just a proof-of-concept demonstration. “In order to bring longevity to the project, we considered it important that, parallel to support by SURFsara Machine Learning experts, PIs hire a local expert, so that knowledge acquired during the project could be secured within the research group itself.” Berg observed, “The principle investigators were happy that SURFsara was funding this.”
In order to bring longevity to the project, we considered it important that, parallel to support by SURFsara Machine Learning experts, PIs hire a local expert, so that knowledge acquired during the project could be secured within the research group itself – Axel Berg
From interest to a pragmatic approach
The SURFsara team first started their work as an exploratory effort based on their expertise in the area of scaling up deep learning and wide network training[iii]. The initial motivation was provided by early reports and some initial work about accelerating HPC applications with AI technologies such as artificial neural networks (ANNs).
In one example, a CERN team demonstrated that “energy showers” detected by calorimeters can be interpreted as a 3D image[iv]. Adapting existing deep learning techniques, they then decided to train a GAN (a Generative Adversarial Network) to act as a replacement for the expensive Monte Carlo methods used in HEP simulations. During validation, CERN observed a “remarkable”[v] agreement between the images from the GAN generator and the Monte Carlo images along with orders-of-magnitude faster runtime. SURFsara contributed to this work with scaling optimization. [vi] [vii]
Recognizing the potential, the SURFsara team contacted scientists working in different fields of research to assess their level of interest in AI technology and willingness to explore potential avenues for collaboration. Not surprisingly, the SURFsara team found that many of the scientists were not experts in AI or in scaling their HPC workloads, but they also found that many scientists were familiar with, and were open to the idea of using machine learning data derived models in their research. The motivation for a proof of concept effort lay in the hope that the ANN can provide new insights, faster inferencing performance, expanded predictive capability, and/or higher fidelity predicted results to name but a few potential benefits.
Getting AI to work for the scientist in their domain of interest
To investigate the enhancement of traditional HPC simulations, SURFsara selected four scientific use cases spanning very different scientific domains.
- Distinguishing biological interfaces from crystal artifacts in biomolecular complexes using deep learning – Prof. Alexandre M.J.J. Bonvin, Computational Structural Biology, Utrecht University;
- Machine-Learned turbulence in next-generation weather models – Dr. Chiel van Heerwaarden, Meteorology and Air Quality Group, Wageningen University;
- Machine learning for accelerating planetary dynamics in stellar clusters – Prof. Simon Portegies Zwart, Computational Astrophysics, Leiden University.
- Generating physics events without an event generator – Dr. Sacha Caron, Experimental High Energy Physics, Radboud University.
Distinguishing biological interfaces from crystal
Prof. Dr. Alexandre Bonvin (Utrecht University) notes, “Trained as a chemist, AI and in particular deep learning sounds like a black box that can solve a problem. Next to generating a reliable AI-based predictor, our hope is to discover new molecular aspects/features that we did not think of before as being relevant. We can then go back to investigating those in a more conventional manner with the hope to ultimately directly improve our software.”
Dr. Bonvin decided to work with SURFsara as they bring domain-specific expertise in choosing the right AI tools and approach along with the ability to help specify and train the proper neural network to assist their research using PDB (Protein Data Bank) data.
Figure 1: An example problem showing Crystallographic VS biological assembly. (Image provided by Prof. Dr. Alexandre Bonvin)
Machine-Learned turbulence in next-generation weather models
Dr. Ir. Chiel van Heerwaarden (Wageningen University & Research) points out that “as a meteorological group we are only users of AI”. He decided to work with SURFsara to help reach the accuracy and performance that AI promises to deliver for his calculations. A key example is the scaling up of neural networks over large machines.”
More specifically Van Heerwaarden observes that atmospheric models are revealing more and more detail, “We are now approaching the limit in which complex structures, such as individual clouds can be resolved in a model. However, the models we have are quite often computationally very expense, making them unsuitable for weather forecasting because we won’t be able to finish the forecast in time. With AI, we hope to learn the behavior of those mathematical models, and potentially reach the same, or maybe even better, accuracy and finer scales of resolution with a lower computational cost that is perhaps even an order of magnitude less than current approaches.”
[v] More precisely, the CERN team compared high level quantities (e.g. energy shower shapes) and detained calorimeter response (e.g. single cell response) between the trained generator and the standard Monte Carlo. The CERN team describes the agreement, which is within a few percent, as “remarkable”.