Academic centers and government agencies often design and write their own applications, but some of them and the vast majority of enterprise customers with HPC applications usually depend on third parties for their software. They also depend upon those software developers to continually enhance and scale those applications, and that means adding support for GPU accelerators. Two important ones, Gaussian and ANSYS, depend not only on GPUs, but the OpenACC programming model, to extend across more cores and therefore do more work faster.
Let’s start with Gaussian.
The way that chemicals react can be the difference between a product success and an environmental failure. Some industries care about whether a refrigerant will deplete ozone levels. Others want to know the rate of decay for pesticide as it absorbs sunlight, or whether the pigment used for a commercial dye can absorb light without falling apart.
To find these answers, they call in Gaussian. Using the company’s computational chemistry software, they can simulate the properties of molecules reacting with each other.
“Ultimately, you must test physical chemistry to be certain of a result,” says Michael Frisch, President and CEO at Gaussian. But throwing chemical compounds at the problem and seeing what sticks is an expensive and inefficient process. Calculation can substantially reduce the number of physical tests. It is far easier to test 50 compounds than a thousand. “Computation is not a substitute,” he asserts, “but it is a lot cheaper.”
Since 1987, Gaussian has been harnessing some of the world’s fastest computers to help its customers scale up pure compounds. The more powerful the computers, the more accurate the results. Gaussian’s software, including the most recent Gaussian 16 release, can quickly saturate supercomputers like those from Hewlett Packard Enterprise, IBM, and Cray in the quest for increasingly accurate and comprehensive results. “The quality of calculation and the size and number of models we can process is determined by the computing power we have available,” says Frisch.
When GPUs emerged as a potential avenue to accelerate computational chemistry calculations, Gaussian was intrigued. “We’re always working on making more cost-effective hardware choices available to users of Gaussian,” says Frisch. “There’s nothing you can do with GPUs that you can’t do without them, but they are a very economical way to significantly increase the performance and power efficiency of a CPU-only workstation or server.” By adding GPUs to a server, Gaussian can deliver up to three times the computing capacity and throughput of a CPU-only system.
To do that, the company built support for GPUs into its Gaussian 16 software, but it had requirements to retain the maintainability and portability of its code base. The developers of Gaussian are constantly enhancing its algorithms, and adding support for GPUs wasn’t something they took on lightly. As a result, Gaussian only began exploring GPU acceleration in depth after OpenACC directives enabled automatic GPU code generation by a compiler while preserving source code portability across other compilers and systems.
“The requirement was anything we do with GPUs had to be on one code base,” says Frisch. “That meant we weren’t interested in GPUs until OpenACC came along.” The company is now working on support for IBM OpenPower-based systems using Nvidia GPUs. He says the OpenACC and OpenPower support in Nvidia’s PGI compilers will soon enable the company to sell its software on IBM’s latest GPU-enabled supercomputers, further expanding choices for Gaussian customers.
Now let’s talk about ANSYS.
While Gaussian 16 focuses on computational chemistry applications, ANSYS Fluent is a versatile computational fluid dynamics (CFD) tool that also enables the simulation of air and fluid flow. It is widely used in automotive, aerospace, academia, oil and gas, marine and Formula 1 racing. Its users model everything from the external aerodynamics on automobiles and planes through to the compression of air inside an engine. Customers often use large meshes of up to a billion cells to model a car design or airplane wing, and want to run CFD simulations over them for up to an hour of real time.
“This rapidly scales up into very large problem sizes,” says Sunil Sathe, lead software developer at ANSYS. “You need high core count CPU compute clusters or some fancier method to do these calculations quickly enough to be practical.” In the case of ANSYS, that fancier method is the use of Nvidia Tesla GPU accelerators, which have literally thousands of computing cores to break a problem down into pieces and process them in parallel. “We need the ability to give customers the option for faster GPU processing on problems where it makes sense,” says Sathe.
“Given how fast GPUs have been evolving in recent years, and how fast that computational power is growing year after year, it’s an important platform for us,” says Sathe. However, ANSYS needs a way to GPU accelerate Fluent without compromising code maintainability or support for CPUs in any way. “We wanted something that could be turned on or off at compile time,” Sathe recalls. Because OpenACC is pragma-based, it meets ANSYS’ needs and allows them to write code once and compile it for either a CPU or GPU target architecture.
“Another consideration is which implementations are available for a programming model,” he says. “We need an implementation available on Windows and Linux – something that’s pretty much supported on all the platforms that Fluent supports. Considering all that, OpenACC was the only choice for us.”
Integrating OpenACC into the Fluent code base wasn’t easy. The company had to redesign its solvers and algorithms to be multi-threaded, Sathe says. It prioritized for acceleration those parts of its source base that cover several of the use cases most in demand from its clients. Radiation transport modeling was the first. It enables vehicle manufacturers to handle tasks like headlamp simulations. “Where we have done it right, with later generation GPU hardware like NVIDIA’s Tesla P100, we have gained performance of 4X to 8X, compared to a single CPU socket,” according to Sathe. Use of OpenACC is now being applied to other areas of the source base, and GPU acceleration of additional use cases is coming in future releases of Fluent.