At The Next FPGA Platform event in January there were several conversations about what roles reconfigurable hardware will play in the future of deep learning. While inference was definitely the target of most of what was discussed, there is ample opportunity across the spectrum for acceleration but that changes with the type of neural network.
Martin Ferianc, a machine learning researcher at University College London, says that FPGAs have advantages in power consumption, which makes them suitable for applications with energy constraints. Secondly, the reconfigurability of the FPGA makes it an ideal hardware platform when the algorithm changes frequently, especially for deep learning. Thirdly, compared with SIMD hardware such as GPUs, FPGAs usually have lower latency performance because it allows custom architecture design.
These advantages shift depending on the type of neural network but to Ferianc and his team, CNNs are a perfect fit for FPGAs. His team is exploring performance estimation techniques for FPGA-based acceleration of convolutional neural networks (CNNs) and have given extensive thought to the various advantages and drawbacks of using FPGAs for deep learning, especially in computer vision where CNNs dominate. These are high-value application areas given their scope in everything from advanced video security, autonomous cars, medical imaging, and defense, among other areas.
When running on CPUs, CNNs are using hardware designed for general purpose applications, which, relatively speaking, offers a few very fast processing cores. A GPU offers many more processing units, with a slight speed decrease. Both of these are designed for a certain data format, which may not always be optimal for the CNN used. However, Ferianc says FPGAs allow for the implementation of exactly the necessary design, meaning that the CNN can make use of optimal processing units. These processing units can be interfaced together more efficiently than with a GPU/CPU, since the channel is also purpose-built for the particular CNN. By allowing CNN to work with any desired data width, the accuracy can be tuned to the desired level.
“FPGAs are a powerful platform for accelerating machine learning algorithms and especially neural networks. Nevertheless, their configurability and resource dedication needs to be carefully considered given the application and the underlying neural network architecture,” Ferianc says. “To determine the optimal configuration of an FPGA-based NN accelerator, such as the levels of parallelism, it is necessary to explore the potential FPGA configurations and an accurate performance prediction is crucial to steer the exploration towards the desired direction.”
Getting a grip on performance is the basis of the team’s work, which introduces a novel method for fast and accurate estimation of latency based on a Gaussian process parametrized by an analytic approximation and coupled with runtime data.
“In simple terms, we introduced a reliable model for performance prediction that uses the existing method as the base of the prediction and any collected data as refining points to improve the prediction. We tested the method on estimating latency, however, it can also be used to estimate energy consumption.”
This is of interest because it is still difficult to get a handle on the performance potential of FPGAs for CNNs, not to mention other frameworks and methods. Community benchmarking efforts like MLperf have shown results for some devices, but one has to look harder for performance evaluations of FPGAs for CNNs that are more fine-grained. The work Ferianc and team have put together highlights the difficulty of building such tooling but as a side effect, also exposes some of the difficulty in getting an FPGA primed for CNNs.
Still, Ferianc is bullish about the future of FPGAs for CNNs (over other neural networks, including RNNs, GANs, etc.). “Applications areas such as autonomous vehicles or medical imaging present an interesting use-case for CNNs for object tracking, instance segmentation, and even depth approximation,” he says.
Still, Ferianc adds, “Despite this impressive progress in practicality, CNN itself is a computationally intensive model, which is also the biggest drawback. Extracting the features and processing them requires an enormous amount of hardware and electric power. State-of-the-art CNN research often demonstrates GPU usage and favorable processing times, however, it’s the power consumption which is not suitable for real-time application. Taking this into account, FPGA is a better fit as it is a low power device, which provides a similar degree of acceleration to a GPU.”
The authors include:
Martin Ferianc, PhD student, martin.ferianc.19@ucl.ac.uk, United Kingdom, University College London, http://www.ferianc.eu
Hongxiang Fan, PhD student, h.fan17@imperial.ac.uk, United Kingdom, Imperial College London, https://os-hxfan.github.io/
Ringo S. W. Chu, Undergraduate student, ringo.chu.16@ucl.ac.uk, United Kingdom, University College London, http://ringoswchu.com/
Jakub Stano, Masters student, jstano@ethz.ch, Switzerland, ETH Zurich
Wayne Luk, Professor, w.luk@imperial.ac.uk, United Kingdom, Imperial College London, https://www.imperial.ac.uk/people/w.luk
A piece of custom hardware is always better than a CPU/GPU. Processors must running some kind of software. Because that, system response is sluggish sometimes. Years ago, many software experts thought digital cameras can be implemented with some high power general processor. After a long struggle, they gave up and quietly the industry adopted hardware solution like ASIC. There is a sickness in the industry, hardware solution is never any good. Here we are today again, the AI industry realized that without hardare, they will go nowhere.
It’s interesting that Nvidia GPU:s are able to beat Google TPU:s on a performance per watt basis.
Generally one would expect an FPGA/ASIC hardware to be more efficient.
My hypothesis is that the efficiency advantage of FPGA/ASIC hardware dissipates with NN:s due to the memory intensive nature of the problems (modern GPU:s are extremely attuned to utilize the existing bandwidth to RAM).
Dear YAUHO CHOI,
I fully agree with you. And even more: not only is general purpose inefficient, so is DIGITAL. Biological plausibility of current AI is faint. The riddle how biology really does the job is still largely unsolved. But for energy efficiency it’s inevitable that we will see more ANALOG. But industry is even more sick about analog than digital hardware. So there is a long thinking process ahead of us …
Won’t ASIC provide far more performance then fpga?
Quote from GTC2020: “AI/ML models evolve every 6 months”. ASIC design cycle is more than 6 months. By the time AI/ML ASIC is done it is obsolete. This is the main reason to use FPGAs vs. ASICs. There are no difficulties in programming CNN, LSTM, GRU, etc. on FPGAs. @jimmy: Memory management problem for AI/ML is equally difficult for CPUs, GPUs, ASICs or FPGAs. @yuaho and @oliver: there is no sickness in the industry but an overwhelming disproportion of Software to Hardware Engineers 1000:1.