New Approach Could Sink Floating Point Computation

Michael Feldman

5 years ago

In 1985, the Institute of Electrical and Electronics Engineers (IEEE) established IEEE 754, a standard for floating point formats and arithmetic that would become the model for practically all FP hardware and software for the next 30 years.

While most programmers use floating point indiscriminately anytime they want to do math with real numbers, because of certain limitations in how these numbers are represented, performance and accuracy often leave something to be desired.

That’s resulted in some pretty sharp criticism over the years from computer scientists who are acquainted with these problems, but none more so than John Gustafson, who has been on a one-man crusade to replace floating point with something better. In this case, something better is posits, his third iteration of his “universal numbers” research. Posits, he says, will solve the most pressing problems of IEEE 754, while delivering better performance and accuracy, and doing it with fewer bits. Better yet, he claims the new format is a “drop-in replacement” for standard floats, with no changes needed to an application’s source code.

We caught up with Gustafson at ISC19. For that particular crowd, the supercomputing set, one of the primary advantages of the posits format is that you can get more precision and dynamic range using less bits than IEEE 754 numbers. And not just a few less. Gustafson told us that a 32-bit posit can replace a 64-bit float in almost all cases, which would have profound implications for scientific computing. Cutting the number of bits in half not only reduces the amount of cache, memory and storage to hold these values, but also substantially reduces the bandwidth needed to shuffle these values to and from the processor. It’s the main reason why he thinks posit-based arithmetic would deliver a two-fold to four-fold speedup compared to IEEE floats.

It does this by using a denser representation of real numbers. So instead of the fixed-sized exponent and fixed-sized fraction used in IEEE floating point numbers, posits encode the exponent with a variable number of bits (a combination of regime bits and the exponent bits), such that fewer of them are needed, in most cases. That leaves more bits for the fraction component, thus more precision. The reason for using a dynamic exponent is that it can provide tapered accuracy. That means values with small exponents, which are the ones commonly used, can have more accuracy, while the lesser-used values that are very large and very small have less accuracy. Gustafson’s original 2017 paper on posits provides an in-depth explanation of exactly how this works.

Another important advantage to the format is that unlike conventional floats, posits produce the same bit-wise results on any system, which is something that cannot be guaranteed with the IEEE standard (even the same computation on the same system can product different results for floats). It also does away with rounding errors, overflow and underflow exceptions, subnormal (denormalized) numbers, and the plethora of not-a-number (NaN) values. Additionally, posits avoids the weirdness of 0 and -0 as two distinct values. Instead it uses an integer-like twos complement form to encapsulate the sign, which means that simple bit-wise comparisons are valid.

Associated with posits, is something called a quire, an accumulator mechanism that enables programmers to perform reproducible linear algebra – something not possible with IEEE floats. It supports a generalized fused-multiply add and other fused operations that lets you compute dot products or sums without risking rounding errors or overflows. Tests run at UC Berkeley demonstrated that quire operations are about three to six times quicker than performing the operations serially. According to Gustafson, it enables posits to “punch above their weight class.”

Although the numeric format has only been around for a couple of years, there is already interest in the HPC community to explore their use. At this point, all of this work is experimental, based on anticipated performance on future hardware or by using a tool that emulates posit arithmetic on conventional processors. There are currently no chips in production that implement posits in hardware. More on that in a moment.

One potential application is for the upcoming Square Kilometer Array (SKA), which is considering posits to dramatically reduce the bandwidth and computational load to processes the SKA radio telescope data. The supercomputers that do this need to draw no more than about 10MW and one of the more promising ways they think this can be achieved is to use the denser posit format to slice the anticipated bandwidth of memory (200 PB/sec), I/O (10 TB/sec) and networking (1 TB/sec) in half. Computation would be improved as well.

Another application is for weather and climate forecasting, where a UK-based team has demonstrated that 16-bit posits clearly outperformed standard 16-bit floats and has “great potential for more complex models.” In fact, the 16-bit posit emulation performed on par with the existing 64-bit floating point implementation on this particular model.

Lawrence Livermore National Lab has been evaluating posits and other new number formats, with the idea that they can help reduce data movement in future exascale supercomputers. In some cases, they also found that better answers were generated. For example, posits were able to deliver superior accuracy on physics codes like shock hydrodynamics, and generally outperformed floats on a variety of measures.

Perhaps the largest opportunity for posits is in machine learning, where 16-bits can be used for training and 8-bits for inference. Gustafson said that for training, 32-bit floating point is overkill and in some cases doesn’t even perform as well as the smaller 16-bit posits, explaining that the IEEE 754 standard “wasn’t designed for AI at all.”

Not surprisingly, the AI community has taken note. Facebook’s Jeff Johnson has developed experimental platform with FPGAs using posits that demonstrates better energy efficiency than either IEEE float16 or bfloat16 on machine learning codes. Their plan is to investigate a 16-bit quire-type approach in hardware for training and compare it to the competing formats just mentioned.

Here it’s worth noting that Facebook is working with Intel on its Nervana Neural Network Processor (NNP), or some variation of it, the idea being to accelerate some of the social media giant’s AI workloads. It would not be out of the realm of possibility that a posits format could be used here, although it’s more likely that Intel will stick exclusively to Nervana’s original FlexPoint format. In any case, that’s worth keeping an eye on.

Gustafson knows of at least one AI chip startup that is looking to use posits in their processor design, although he was not at liberty to share which company that was. Kalray, the French firm working with the European Processor Initiative (EPI), has also shown interest in supporting posits in their next-generation Massively Parallel Processor Array (MPPA) accelerator, so the technology may find its way into the EU’s exascale supercomputers.

Gustafson is understandably encouraged by all this and believes this third attempt at his universal numbers may be the one that catches fire. Unlike versions one and two, posits is straightforward to implement in hardware. And given the furious competition in AI processors these days, it may find have found the killer app to make it commercially successful. Other platforms where posits could have a bright future are digital signal processors, GPUs (both for graphics and compute), IoT devices, and edge computing. And, of course, HPC.

If the technology gets commercial traction, Gustafson himself probably won’t be able to capitalize directly on its success. The design, as specified in his initial 10-page standard, is completely open source and freely available to use by any company willing to develop the requisite hardware and software. Which probably explains why companies like IBM, Google, Intel, Micron, Rex Computing, Qualcomm, Fujitsu, and Huawei, among others, are looking at the technology.

Nonetheless, replacing IEEE 754 with something better would be quite the accomplishment, even for someone with Gustafson’s impressive resume. Even before his stints at ClearSpeed, Intel, and AMD, he had been looking at ways to improve scientific computing on modern processors. “I’ve been trying to figure this out for 30 years,” he said.