At The Next FPGA Platform event in January there were several conversations about what roles reconfigurable hardware will play in the future of deep learning. While inference was definitely the target of most of what was discussed, there is ample opportunity across the spectrum for acceleration but that changes with the type of neural network.
Martin Ferianc, a machine learning researcher at University College London, says that FPGAs have advantages in power consumption, which makes them suitable for applications with energy constraints. Secondly, the reconfigurability of the FPGA makes it an ideal hardware platform when the algorithm changes frequently, especially for deep learning. Thirdly, compared with SIMD hardware such as GPUs, FPGAs usually have lower latency performance because it allows custom architecture design.
These advantages shift depending on the type of neural network but to Ferianc and his team, CNNs are a perfect fit for FPGAs. His team is exploring performance estimation techniques for FPGA-based acceleration of convolutional neural networks (CNNs) and have given extensive thought to the various advantages and drawbacks of using FPGAs for deep learning, especially in computer vision where CNNs dominate. These are high-value application areas given their scope in everything from advanced video security, autonomous cars, medical imaging, and defense, among other areas.
When running on CPUs, CNNs are using hardware designed for general purpose applications, which, relatively speaking, offers a few very fast processing cores. A GPU offers many more processing units, with a slight speed decrease. Both of these are designed for a certain data format, which may not always be optimal for the CNN used. However, Ferianc says FPGAs allow for the implementation of exactly the necessary design, meaning that the CNN can make use of optimal processing units. These processing units can be interfaced together more efficiently than with a GPU/CPU, since the channel is also purpose-built for the particular CNN. By allowing CNN to work with any desired data width, the accuracy can be tuned to the desired level.
“FPGAs are a powerful platform for accelerating machine learning algorithms and especially neural networks. Nevertheless, their configurability and resource dedication needs to be carefully considered given the application and the underlying neural network architecture,” Ferianc says. “To determine the optimal configuration of an FPGA-based NN accelerator, such as the levels of parallelism, it is necessary to explore the potential FPGA configurations and an accurate performance prediction is crucial to steer the exploration towards the desired direction.”
Getting a grip on performance is the basis of the team’s work, which introduces a novel method for fast and accurate estimation of latency based on a Gaussian process parametrized by an analytic approximation and coupled with runtime data.
“In simple terms, we introduced a reliable model for performance prediction that uses the existing method as the base of the prediction and any collected data as refining points to improve the prediction. We tested the method on estimating latency, however, it can also be used to estimate energy consumption.”
This is of interest because it is still difficult to get a handle on the performance potential of FPGAs for CNNs, not to mention other frameworks and methods. Community benchmarking efforts like MLperf have shown results for some devices, but one has to look harder for performance evaluations of FPGAs for CNNs that are more fine-grained. The work Ferianc and team have put together highlights the difficulty of building such tooling but as a side effect, also exposes some of the difficulty in getting an FPGA primed for CNNs.
Still, Ferianc is bullish about the future of FPGAs for CNNs (over other neural networks, including RNNs, GANs, etc.). “Applications areas such as autonomous vehicles or medical imaging present an interesting use-case for CNNs for object tracking, instance segmentation, and even depth approximation,” he says.
Still, Ferianc adds, “Despite this impressive progress in practicality, CNN itself is a computationally intensive model, which is also the biggest drawback. Extracting the features and processing them requires an enormous amount of hardware and electric power. State-of-the-art CNN research often demonstrates GPU usage and favorable processing times, however, it’s the power consumption which is not suitable for real-time application. Taking this into account, FPGA is a better fit as it is a low power device, which provides a similar degree of acceleration to a GPU.”