The FPGA Accelerated Cloud Push Just Got Stronger
November 30, 2016 Nicole Hemsoth
FPGAs have been an emerging topic on the application acceleration front over the last couple of years, but despite increased attention around use cases in machine learning and other hot areas, hands have been tied due to simply on-boarding with both the hardware and software.
As we have covered here, this is changing, especially with the addition of OpenCL and other higher-level interfaces to let developers talk to FPGAs for both network and application double-duty and for that matter, getting systems that have integrated capabilities to handle FPGAs just as they do with GPUs (PCIe) takes extra footwork as well. The solution to the adoption problem, aside from the programmability step, has been to make FPGAs available in a cloud environment, which also helps reroute around the cost for those looking to experiment. As obvious of a move this seems to be, the large cloud providers have been slow to make this happen. That is changing too, but we still have some time to go before both the hardware and tooling are on the same cloud platform for a larger range of developers and users.
FPGA makers are certainly seeing the writing on the wall when it comes to their devices being paired with big public cloud instances. When Intel acquired Altera last year, we wondered what it might mean for the swiftly expanding market for reconfigurable computing and more narrowly, what it could signal for the other leading FPGA company, Xilinx. It was clear well in advance of the Intel acquisition that FPGAs were poised to make greater inroads in the datacenter, a matter that was confirmed by Intel’s figure that by 2020, up to one-third of cloud providers would be using hybrid FPGA server nodes—a number they used to prop up the absurdly high $16.7 billion acquisition sum (for a company that saw itself playing in a future market worth around $1 billion).
While Intel may have been banking on hyperscale and cloud companies to support their FPGA investments, standalone rival, Xilinx bolstered its efforts to reach out to larger markets for both the application and networking/storage sides of its FPGA business. In that meantime as well, a great many new efforts have cropped up showing how FPGAs can snap into an ever-widening array of workloads on the compute side, particularly as machine learning, IoT, and other trends continue to ramp.
In short, it has been a good time to be the underdog, if we can call Xilinx that for being on its own. Today, Amazon Web Services announced the very thing Intel had been banking on—a forthcoming host of FPGA-enabled nodes on its EC2 cloud. This offering will be leveraging Xilinx devices, which as the FPGA maker’s SVP of Corporate Strategy, Steve Glaser tells The Next Platform, shows FPGAs going mainstream in hyperscale datacenters. “We recently introduced the Xilinx Reconfigurable Acceleration Stack to speed up this type of adoption and the AWS announcement today is further evidence this is happening right now and the momentum is building.”
As the company’s Jeff Barr describes, these nodes can have up to eight of the following in a single F1 instance: the 16nm Xilinx UltraScale+ VU9P, dedicated PCIe interface with the 2.3 GHz base-speed Broadwell E5-2686 CPU, and four DDR4 channels.
Barr says that “in instances with more than one FPGA, dedicated PCIe fabric allows the FPGA to share the same memory address space and to communicate with each other across a PCIe fabric at up to 12 Gbps in each direction. The FPGAs within an instance share access to the 400 Gbps bidirectional ring for low-latency, high-bandwidth communication” although as one can imagine, this will take protocol writing to make happen on the user end.
Users can write their code using either VHDL or Verilog and then use verification tools from Xilinx, including their Vivado design tools or other compilers. OpenCL tools were not described, we will follow up on that when we can get comment from AWS. So far, the tooling is the lower level stuff for experienced users and as a side note, is only open in the AWS US East region for now. On that code note, we have to temper the idea that this availability will propel FPGAs into the real mainstream because so far, the tooling AWS is providing is only really going to appeal to folks who already have experience working with FPGAs, even if Jeff Barr says it is simplified. It is for this reason that those calling this “FPGA as a Service” might be a bit off track since that would imply the tooling is in place to make it a real out of the virtual box option.
It is difficult to tell just how many nodes inside AWS datacenters will be FPGA enabled for user adoption, but the beauty about these devices is that AWS might have already had these installed on EC2 servers anyway to support its so-called “smart NICs” which use reprogrammable logic for network activity, but leave a good part of the FPGA idle for other purposes in the meantime and can come with algorithms pre-loaded that can be chewed on for whatever part of the workload desired.
This part is just speculation since AWS has not responded to our requests for more information, but this is something Microsoft does with its FPGA “personalities” that can handle SDN activity then flop over to handle Bing and other workloads they proved out with their Catapult servers. These Catapult machines, by the way, have given way to the Olympus servers (a rack version with full-bore PCIe cards that can handle GPUs, FPGAs, or a mix) that we can see Microsoft spinning out to offer in Azure by the same time AWS F1 (FPGA) instances are made widely available (the announcement today was a developer preview of the offering).
“Large-scale financial clearing risk management is essential to delivering value for our customers,” said Kevin Kometer, Chief Information officer, Chief Information Officer, CME Group. “CME Group has long been an innovator in the use of accelerated computing, for the clearning risk management of increasingly complex instruments, including extensive research into FPGAs. Amazon EC2 F1 instances will allow us to substantially accelerate rate of innovation of risk analysis for our customers, while delivering greater cost efficiency relative to using traditional IT infrastructure.”
While Amazon may be the first major cloud to offer FPGAs in the cloud (with Azure right on its heels, we imagine) they are not the first overall. Some of the smaller high performance computing cloud companies have been early to FPGA bat, including supercomputing cloud company, Nimbix. What Nimbix has done that AWS hasn’t, at least at this point, is put the higher-level developer tools and environments in place that will actually open FPGA adoption to a much larger potential market. It could be they are testing the waters to see how interested users are first before investing in the richer set of interfaces and tools to talk to FPGAs while leveraging the same parts they use for other datacenter functions, but if interest in FPGAs even just on the application front side is any indicator, there is a rich opportunity here.
While the focus has been on how this has been an important boost for Xilinx, just because AWS didn’t pick Altera for this particular doesn’t mean the Intel-led company has been left on the cutting room floor. While Xilinx will likely continue picking up deals like this one (which is good, otherwise Altera/Intel will have a monopoly situation), it puts the two FPGA makers to the test—and keeps them on their development toes. Although here we tend to cover the application and to a lesser extent, network/storage functions and futures of FPGAs, there is a wide world of FPGA device applications in embedded, military, IoT, and other markets that is also on the grow.