When Intel purchased Altera in 2015 for $16.7 billion, company officials predicted that up to a third of servers would be equipped with FPGAs by 2020. While that’s unlikely to happen, it hasn’t quelled Intel’s ambitions for its FPGAs in the datacenter and elsewhere.
Nor should it. Intel has racked up some notable success stories with its FPGA offerings over the last four years, with perhaps the most noteworthy being Microsoft’s embrace of the technology with Project Catapult. That effort laid the groundwork for Intel FPGAs being deployed across the entire Azure cloud, where the devices are being used to accelerate everything from Bing searches to networking services.
More recently, Intel revealed Japanese internet service provider Rakuten is employing Intel FPGAs to speed voice and video data through its mobile network. And in the edge computing space, NEC is tapping Arria 10 FPGAs to power NeoFace, its facial recognition engine that is deployed in a variety of devices and appliances in the field.
It is noteworthy that all of these use cases pair Intel FGPAs with Xeon processors, a combo that the chipmaker has talked up ever since it acquired Altera. Intel has tantalized us with an initial Xeon-FGPA hybrid product, leading to plenty of speculation on our part, but no commercial product line ever emerged.
Until this week.
However, the new product family, known as Agilex, is both more and less than a straightforward Xeon-FPGA hybrid. In fact, the Xeon processor makes its appearance as an attachment to the platform, albeit a cache-coherent one, rather than as an integrated component. But we will get to that in a moment. Before we break down the individual pieces of the platform, the graphic at the top of this story will give you an overall sense of the various technologies that Intel has inserted into the architecture.
Essentially, Agilex is a heterogeneous package of logic, memory, and interfaces that can connect an FPGA core (which includes a configurable DSP and an optional Arm SoC) with a Xeon processor, custom chips (actually chiplets), and I/O devices. The glue between the FPGA and the other components is accomplished with Intel’s Embedded Multi-die Interconnect Bridge (EMIB), a technology designed to hook together disparate chips within a single package.
The FPGA part will be manufactured on Intel’s 10 nanometer process node, which undoubtedly will help deliver the 40 percent higher performance and 40 percent lower power Intel is claiming when compared to its current Stratix 10 FPGAs. It’s not clear how much of that better performance and efficiency is attributed to the hardened DSP, but it appears this component was upgraded fairly aggressively.
According to Intel, the Agilex DSP provides up to 40 peak teraflops at 16 bits, and supports FP32, bfloat16, FP16 and INT8 numerical formats. It can also be configured to support lower precision integers – anything between INT7 through INT2. Intel says this is the first, and so far only FPGA to support hardened bfloat16 and FP16. Given these formats, it looks like Intel is aiming pretty squarely at inferencing trained neural networks, an application FPGAs have proven to be particularly good at.
As we mentioned, an Agilex device can connect to a Xeon processor, with the hookup implemented via a cache-coherent UltraPath Interconnect (UPI) link, allowing the FGPA and Xeon memories to share the same address space. For those of you who may have missed it, UPI is a more efficient and slightly faster (10.4 GT/sec) replacement for QPI that was introduced in the Skylake Xeon SP processors in the summer of 2017. Its presence here in Agilex reflects Intel’s thinking that FPGAs can act as true peers to the CPU, rather than just as coprocessors hanging off a PCI-Express bus.
For more specialized processing, Agilex devices can also connect custom chiplets, both from Intel and third-party providers. A key technology Intel has brought to the table here is eASIC, obtained from the company of the same name it purchased in 2018.
It can transform a configurable logic block into something intermediate between an FPGA and an ASIC. According to Intel, eASIC “offers performance and power-efficiency closer to a standard-cell ASIC, but with the faster design time and at a fraction of the non-recurring engineering costs associated with ASICs.” The claim is that it can deliver a tested prototype in as little as five weeks. Given all that, we have a feeling that eASIC technology will be making an appearance in other Intel products in the not-too-distant future.
Heterogeneity also extends to the Agilex memory and I/O. The platform supports DDR4, DDR5, and High Bandwidth Memory (HBM), plus Intel’s own Optane DC Persistent Memory, while device connectivity is provided by a PCI-Express 4.0 or 5.0. A 112G SerDes transceiver interface is also available, four lanes of which will provide enough for a 400 Gb/sec network link.
The Agilex product family is broken down into the F-series, I-series, and M-series. The graphic below shows the various interfaces and options available in each series, with capability increasing as you ascend the alphabet.
The break down also reflects the wide array of environments Agilex is targeting: everything from hyperscale clouds and enterprise datacenters, to the edge and embedded space. Note that the application set overlaps a lot of areas where GPUs are playing today, which makes us wonder how Intel will position its upcoming Xe discrete GPUs that are slated to debut in 2020. Perhaps an Xe chiplet option is in the works.
Agilex will inevitably invite comparisons to Xilinx’s adaptive compute acceleration platform (ACAP) that was unveiled last year. Like Agilex, ACAP is implemented as a heterogeneous package, anchored by an FPGA, and intended to offer a data-centric platform for a wide array of workloads in datacenters and edge environments. But Agilex brings in a lot of home-grown Intel technologies that Xilinx would be hard-pressed to reproduce. That should make for an interesting rivalry in the years ahead, as both companies refine their products and press their respective advantages in technologies and expertise.
In the meantime, Intel has to draw customers and third-party IP providers into the Agilex fold. That begins in April, when the company plans to give select users early access to Agilex hardware and development tools. General availability is planned for the third quarter of the year.
Agilex gets cache coherence via the CXL interconnect riding on top of PCIe and not via UPI/QPI as mentioned.