More Proof Points for Low Precision HPC

While it might not happen anytime soon, traditional supercomputing could be in for a sea change with wider acceptance of lower-precision calculations. Such a change translates into reduced accuracy but with the benefit of orders of magnitude higher efficiency and speed. What is acceptable in that tradeoff depends on the application.

Until recently, there was no hardware support for reduced precision HPC for centers to experiment with performance, efficiency, and accuracy tradeoffs. However, with the rise of devices built for AI/ML and 32, 16, and even 8-bit precision and with even the beefiest of CPU architectures offering low-precision capabilities on board an HPC-oriented processor, we can expect to see more robust work to test low-precision HPC waters.

Much of the early work on 32-bit and more recently, 16-bit HPC has taken place on the top-ranked Fugaku supercomputer at RIKEN in Japan, which is based on the Arm-based A64X processor from Fujitsu.

The A64X architecture supports standard double-precision as well as 32 and 16-bit with roughly linear speed improvements with each step down, at least on benchmarks. While performance improvements depend on the needs of the application, RIKEN has shown remarkable results on both traditional HPC and emerging AI workloads by leveraging low and mixed precision, as have other centers, including the University of Oxford, who have their own experimental A64X-based machines like Isambard at the University of Bristol.

The Isambard machine was recently deployed to evaluate the precision tradeoffs in one of the most computationally demanding in supercomputing—weather and climate modeling. With vastly scalable models and ongoing need, this segment of HPC legitimizes huge exascale investments. However, some argue the high accuracy, which comes at great computational cost, is not necessary and could improve weather and climate efficiency by orders of magnitude, as shown by ECMWF and its work on 32-bit precision for some calculations.

Half precision is already being proven in climate and weather but 16-bit has been less explored.

In the Isambard climate work, which is based on a fluid simulation element of atmospheric science workflows, the team was able to reach speedups of 3.6X. While it took a fair amount of work on the model, called ShallowWaters.jl, including some tradeoffs in terms of accuracy, the University of Oxford and ECMWF team say “low precision calculations for weather and climate simulations are a potential that is not yet fully exploited. While the first weather forecast models are moving towards Float32, 16-bit arithmetic will likely find increasing support on future supercomputers.”

Below is a sense of the speedups and the scalability of those improvements as observed on Isambard.

“The model implements techniques that address precision and dynamic range issues in 16-bit. The precision-critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed-precision with 16 and 32-bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6.10-5 to 65504. We develop the analysis-number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers.”

While the team behind the low precision work admit the fluid simulation model is far simpler than double-precision codes running on large supercomputers, it does open the door to far more efficient results in a shorter timespan. The problem is that accuracy is an important part of climate and weather and because of the multi-layered nature of production weather and climate simulations, having one part of a calculation go awry can affect a much larger system.

More practically, as mentioned before, getting such complex codes, even simplified for the purpose of experimentation, to fit into the limits of 16-bit is non-trivial.

“The work shows that a naïve translation of the mathematical equations into code will likel fail with 16-bit arithmetic. However, this does not mean 16-bit is unsuited for the numerical solution of complex partial differential equations such as the shallow water equations. But it does mean that both precision and range issues have to be addressed in the design of the algorithms used,” the team explains.

While this is just one example of low precision work in HPC, with the availability of more CPUs that can support 16-bit and even Int-8 more teams will explore this potential. Even if takes focused code toiling, the prospect in areas that are less accuracy-sensitive could see dramatic improvements in computational costs.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

2 Comments

  1. I know that the Fugaku ARM64X does not support bFloat16, but how much of an improvement in dynamic range and speed would this have on this type of weather modelling problem, compared with Float32 and Float16?

    • That’s a good question. I suppose it depends on the nature of the application and how much AI is in a future weather modeling system. I suspect on old weather codes that were using a mix of 64-bit and 32-bit floating point, Bfloat16 would not make a huge difference. In modern weather codes, now and in the future, Bfloat or TensorFloat or sparse matrix support and other things like that might be game changers.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.