Time For A Compute Rematch Between The FPGA And The GPU

For any given compute engine, there is the vendor who makes the chip and therefore a lot of the money and then there are the downstream system architects, system integrators, original design manufacturers, and original equipment manufacturers who add further value to that compute engine in one form or another and make their own revenue stream from that innovation.

It is always hard to be Switzerland as any downstream partner to compute engine suppliers, and as is natural at any time given the technical advantages any chip supplier has over its rivals, these downstream partners tend to focus more on one supplier than another.

Gidel, which today makes FPGA accelerator cards based on the Arria 10 and Stratix 10 FPGAs from Intel (formerly Altera), was founded in 1993 and focused on writing algorithms that could be coded onto FPGAs. Four years later, Reuven Weintraub, Gidel’s founder and chief executive officer, saw a need for add-on development tools, and so it created them and has maintained and updated Developer’s Suite since that time to help make it easier to code FPGAs. Gidel created the first frame grabber and camera simulator for video application back in 2006, and has since carved out a niche for itself in video processing with its InfiniVision tools, and is also noteworthy in that it built the largest academic supercomputer based on FPGAs, called Novo-G at the University of Florida, which as it turns out was ultimately considerably larger than we recently wrote about.

Weintraub reminded us of this at our recent The Next FPGA Platform event in San Jose when we sat down to talk about all things FPGA. The initial machine Novo-G machine from 2009 had 192 FPGAs all working in harmony using two dozen single-socket Xeon servers, and it could accelerate certain genomics applications by a factor of around 500X to 600X compared to using an equivalent number of all-CPU two-socket server nodes by our estimation. But what we didn’t know is that the Novo-G system kept growing and got even more powerful.

“Then, we had boards with four FPGAs,” explains Weintraub. “And then with the next generation of FPGAs, it grew to close to 400, and with the third generation it grew to close to 500, and with the tools we provided it was able to work across the different generations so you did not have to throw away your initial investment.” The Novo-G system was also innovative, Weintraub continues, in that it allows a direct 3D torus interconnect between banks of 32 or 64 FPGAs inside of the system – something that Microsoft also implemented in its “Catapult” system but Gidel and the University of Florida did it first. Microsoft Research started the Catapult project in 2010, and parent Microsoft started using Catapult to accelerate Bing search engine page ranking algorithms in 2015 and has subsequently used the FPGAs in its servers as SmartNICs in addition to hosting its BrainWave overlay neural network accelerator.

We had spoken a few weeks earlier ahead of the event to get a sense of the history of Gidel and its place in the FPGA market, and we quipped that the GPU was the best thing – and the worst thing – to ever happen to the FPGA. The GPU has taken some of the roles that FPGA might have fulfilled as an accelerator for CPUs, but the irony is that now that people are familiar with mixed precision and offload programming models, it is now easier to sell an FPGA into the datacenter than it has ever been before. The GPU has competition now, not just from other GPU suppliers, but from FPGAs.

We have said it before and we will say it again: When it comes to supercomputing, we think the FPGA deserves a rematch with the GPU. And we would go further to say that in some cases, a better system might incorporate all three core compute engines: CPU, GPU, and FPGA.

Weintraub concurs about having a throwdown between the GPU and the FPGA for acceleration and in particularly for certain simulation and modeling workloads.

“Novo-G was definitely a giant and there was definitely the ability to see that a lot could be done, and you ask an interesting question about bioinformatics,” says Weintraub. “There is no doubt that with bioinformatics – which is but one example and there are many others – that the FPGA would get much better results. However, the market and people’s way of thinking, they expect to get the same results. So even if you can make with the FPGA better results, say with the BLAST bioinformatics, even if you get better accuracy, people want the same results. And when you are doing optimizations with the FPGA, which we have done many times taking the code from C and doing the optimizations in the FPGA, the number one thing you have to do is get rid of software optimization because the way you optimize in the FPGA is not he way you optimize in C. And in genomics, there is a lot of work in optimizing code in software in CPUs, and therefore it makes it hard to make the changes in the FPGA. Once the public is aware that we can do much better with FPGA optimizations, then I think you will see a jump in bioinformatics using FPGAs because there is no doubt that CPUs and GPUs cannot compete with FPGAs in that market. But the industry needs to understand that and accept that.”

One of the key differentiators with the FPGA is that it can match the bitness of the processing with the bitness of the data. Bioinformatics has 3-bit data types, and certain computer vision processing using 1-bit data, and you can use these smaller data types to push more data through the compute engines or add more layers to a neural network or do other things with this increased effective data capacity and throughput of the overall application – and get better results.

People say they need floating point with maybe double precision or single precision, and there is no way that anyone will accept lesser resolution,” explains Weintraub. “With the deep learning demand that Google came up with, and said that 8-bit would be sufficient to start with – and maybe even less. At the computer level, less than 8-bit or 8-bit were the same. But if you go from floating point to 8-bit and you get less than 1 percent less in the accuracy of the deep learning enables people to do much more. They can have more layers, or something else more sophisticated, and get much better accuracy at the end. In fluid dynamics, if you can have less accuracy of a single computation but have more and thinner elements, maybe the total results will be much better. Google had enough energy for people to open their ears and to understand that in different markets, using less accuracy of a single element but running faster and getting better results, or trying several random numbers to start with you will be able to get a better result. And that I think is crucial to understanding that there is a place for all kinds of applications to use the FPGA – and that you use exactly, and you pay for exactly, what you need.”

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

4 Comments

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.