SPONSORED The need for faster, larger, more accurate design cycles, along with the performance and cost advantages of GPUs, are all driving the next generation of CFD applications forward. With so many shifts ahead, HPC and CFD users are finding the on-prem route limiting in terms of hardware diversity, flexibility, scalability, and readiness for big shifts like AI/ML into the CFD mix.
Like other areas in high performance computing, change happens relatively slowly in computational fluid dynamics (CFD). The most popular commercial codes have been cobbled together over the decades and optimized for architectures that fit the norm rather than break new barriers.
However, driven by increasing model sizes, demands for greater accuracy at scale, and more pressing efforts for simulations to take a more prominent role over physical testing, CFD is finally changing—and that change is coming fast.
On the near horizon, we can expect more companies that previously used CFD as a visualization tool to aid design decisions for physical testing to place more emphasis on simulations. Building a vehicle or plane for testing purposes can add months or even years to design cycles while modern compute capabilities hold the potential for full simulations of entire automobiles or aircraft. Although this is more of a regulatory issue (for instance, government aviation agencies requiring physical tests) Amazon Web Services (AWS) is looking to a future when full-run simulations can hold the full weight of physical testing.
“We are seeing a big increase in the number of companies wanting to move to more simulation-based design processes,” says Dr. Neil Ashton, Principal CFD Specialist Solutions Architect at AWS. He says this is particularly the case with startups that are bringing entirely new products to market, for example, Joby Aviation is developing vertical takeoff and landing aircraft—an entirely novel design that had no prior data from physical testing. But the simulation emphasis is extending to the larger companies as well.
For simulations to have the credibility of physical tests, high accuracy becomes paramount. That means expanding model sizes for high fidelity simulations, which means bringing to bear world-class supercomputing-level resources.
“For the kind of accuracy that generates results trustworthy enough that it’s possible to design a car or plane from simulations alone it takes ultra-high fidelity CFD at large scale. To do this you need thousands of cores and hundreds of millions of cells, which puts even more stress on on-prem machines, especially if you need to run each model on thousands of cores but only during design cycles—not all the time,” Ashton explains.
The flexibility of having supercomputing capabilities on demand is well understood, but this kind of large-scale use case also highlights the importance of another emerging trend in CFD: GPU acceleration.
While GPUs have provided crucial acceleration in HPC for over a decade, it has taken some time for long-held CFD codes to have GPU-ready versions. Ashton predicts far more will be ported, from large ISV codes to popular open source packages like OpenFOAM, which will open even more doors to large-scale CFD on AWS.
“The ISV piece is a bit more forward-looking. We’re seeing codes like Zenotech’s or Fun3D, a large NASA code, with GPU acceleration. I think the shift for CFD to GPU is partly because of new GPUs that are faster with more memory but also the lead time to develop and port these codes has taken many years and is finally catching up. Now we’re at the point where we’ll see that happen and in a few years there will be even more commercial codes that move the GPU way,” Ashton notes.
The codes, commercial or open source, that have been ported to GPU are showing some impressive results on AWS with results comparable or exceeding on-prem HPC clusters.
In their own internal tests, AWS found that using Nvidia A100 GPUs led to a 2-3X increase in time to result and surprisingly, a 2-3X reduction in compute costs. This was confirmed when Zenotech, an AWS partner, tested their own CFD code using Amazon EC2 P4d instances based Nvidia A100 GPUs in EC2 UltraClusters.
“Zenotech used the latest generation GPUs to run 2-3X faster for a full aircraft run versus the maximum number of CPUs that could even be thrown at the problem. The key was that it was also 2-3X cheaper using standard public pricing,” Ashton says. “It’s a real turning point when it’s possible to run much bigger models more cost effectively. And further, these high-end GPUs are even more reason to move to the cloud because companies like this might not have had access to those, especially since so many CFD shops are still in a state of transition between CPU and GPU.”
Some of the price and performance figures might come as a surprise for those who have been in the HPC/CFD trenches on-prem for years. After all, it was only a decade ago when the conventional wisdom was that it was impossible to run tightly-coupled HPC workloads in the cloud without taking a massive performance hit or being slammed by high data movement costs.
Ashton says that 2018 was a real turning point for CFD and physics-driven HPC on AWS. That year brought the Elastic Fabric Adapter, AWS’s own networking interface for low latency, high bandwidth CFD, weather, and other HPC applications. “Now, when you use those instances or the high-end P4d with the Nvidia A100 GPU and 400GB/s network bandwidth, it’s possible to get on-prem, bare metal performance”
Other additions to the AWS wheelhouse for HPC include the AWS Nitro System, in which the hypervisor has its own unique silicon. All the hypervisor overhead is offloaded so HPC users get access to the full number of cores without sacrificing any to virtualization.
Those underlying platform benefits help create on-prem HPC cluster-like performance but the real differentiator, especially in this era of diverse hardware options for high-end CFD, is the ability to find and immediately use the optimal configuration for balanced price/performance.
With the right performance, pricing, platform, and acceleration in place there are still new challenges ahead. Ashton thinks his CFD-focused team at AWS is ahead of the game, especially once machine learning is more closely aligned with traditional HPC.
“Big changes are coming to CFD. It’s not just that the actual models are getting larger and more accurate. It creates a need for larger and more flexible compute, a need for GPUs to make it all cheaper and faster—much faster than just a new generation of CPU can keep up with. Even with GPUs you’re looking at hours of run time on hundreds of GPUs. The next iteration of change comes with AI/ML.
“We believe that people will eventually be running their CFD simulations using a combination of CPUs and GPUs, then feeding those models into deep learning so that in the end, they can do inference and come up with a new design of analysis within seconds versus hours. That will be the real game changer.”
It is time for CFD to have its game changing moment. After decades of codes that were bound by the maximum CPU core count, CFD is on the cusp of several revolutions. While they’re all happening slower than some areas, they are all happening at once. These include greater emphasis on simulations in product design; more compute capability via acceleration for fast, cost-effective large-model execution; increased accuracy driving greater market potential; first steps toward integrating AI/ML into CFD workflows; and growing ISV support for accelerated applications.
On-prem HPC cannot provide the diversity of options for CFD users to adequately explore the many processors (AMD, Intel, Arm) and accelerators in a way as varied as engineering missions demand. For all these reasons AWS is spearheading the (many) revolutions coming to CFD.
This article is sponsored by AWS.