Stratix 10 SX At The Heart Of Intel’s Most Powerful FPGA Accelerator

Intel has started shipping a new FPGA accelerator card based on the high-end Stratix 10 SX FPGA.  Known as the Programmable Accelerator Card D5005, the device is the sequel to the chipmaker’s first Arria 10-based PAC, which was introduced almost two years ago. The new PAC will make its premier in Hewlett Packard Enterprise’s ProLiant DL380 Gen10 server.

The D5005 is built mainly for 2U servers, such as the aforementioned ProLiant DL380, and requires two PCI-Express 3.0 x16 slots. The top-of-the line Stratix 10 SX FPGA offers 2.8 million logic elements and the card that it sits on supports 32 GB of DDR4 memory and two 100 Gb/sec Ethernet network interfaces. Fully outfitted, the D5005 can draw up to 215 watts of power, which puts it in GPU territory, although it is 10 watts less than Intel’s original estimate when it previewed the card last September.

In any case, 200-plus watts relegates the D5005 to special-purpose servers, in this case, those that run high-value data-intensive workloads, such as streaming analytics, financial risk and regulatory analytics, video transcoding, network security, and speech-to-text translation. Intel is also targeting the broader category of AI, although likely only for inferencing in those cases where the latency advantages of the FPGA platform can offer a clear advantage.

By contrast, the less powerful Arria 10 card uses just 66 watts and plugs into a single slot PCIe interface on a 1U server. By all measures, the older PAC is a less performant device, offering just 1.1 million logic units on the Arria GX FPGA, a single 40 Gb/sec Ethernet port, and a maximum of 8 GB of DDR4 memory. The Arria 10 PAC is mainly aimed at database acceleration, financing trading, genomics, and image processing.

However, according to Patrick Dorsey, vice president and general manager of the newly constituted Networking Custom Logic Group (NCLG) at Intel, there is some application overlap with the D5005, especially in the areas of streaming analytics, image processing, and network security.  He says it all depends on the specific application requirements and power constraints. In general, the Arria 10 PAC will tend to be deployed closer to the edge and be deployed more broadly.

Dorsey notes, however, that on a performance per watt basis, the D5005 is the clear winner since the gain in computational horsepower is more than offset by the higher power requirements. “So even though it looks like 3.5X the power, the performance gain is on the order of 10X,” he says. And, of course, if absolute performance is the driving criteria, the D5005 is the obvious choice.

For example, performing Lepton image compression is two-and-a-half times faster on the D5005 than on the Arria 10 PAC. So if you can deal with the higher power envelope, it would be a more economical choice for such applications (more energy efficient and fewer servers needed). On the other hand, if power is an issue or if the server is also being employed for other (non-accelerated) purposes, an Arria 10 PAC is probably the better option.

An important advantage of the D5005 is the larger amounts of memory available, both on the FPGA and as external DDR4 RAM. So for speech-to-text, the more capacious on-chip memory is important in reducing the response time, which is a key requirement for such applications. Alternatively, for video transcoding, access to larger amounts of DDR4 RAM turns out to be the critical factor. For financial risk analysis, both computational power of the larger FPGA, in the form of lots of matrix math performance, and access to on-chip memory access are critical factors.

The D5005 comes with the same software acceleration stack that is offered with the Arria 10 PAC, providing a common interface for developers and enabling code reuse across PAC products. The stack also includes acceleration libraries, drivers, an FPGA interface manager, an SDK for OpenCL, and Quartus Prime. The latter is a set of Intel-developed software tools for designing programmable logic for FPGAs. Although there are only two PAC products today and use cases are limited to what has been installed over the last couple of year, this common stack will become a lot more important if and when the PAC portfolio broadens, and the products become more widely deployed.

Clearly though, that’s exactly what Intel has in mind. According to Dorsey, although the D5005 will initially be available only on the HPE ProLiant DL380, more OEM support is in the offing. Given that the Arria 10 PAC is currently being sold on servers from Dell, Supermicro, Inspur, and Fujitsu, we have a pretty good idea of which OEMs will eventually be certified for the new D5005.

Dorsey also notes that PAC products supporting PCI-Express 4.0 and 5.0 slots are already in the pipeline. In addition, future PAC offerings will also use the Compute Express Link (CXL), Intel’s own interconnect for cache coherent accelerators. “Clearly, we’ve got a roadmap to expand as the capabilities of both the Xeon and the FPGA continues to grow,” says Dorsey.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.