The field programmable gate space is heating up with new use cases driven by everything from emerging network, IoT, and application acceleration trends. Keeping ahead of the curve means expanding on devices that have quite steady improvement cycles, which means the few companies at the top need to get creative to stay competitive.
Xilinx and Altera – which was bought by Intel in 2015 for $16.7 billion – have been the top vendors of FPGAs, which can be programmed and reprogrammed, enabling organizations the ability to adapt the processors to the varying workloads running on the systems. The high price Intel paid for Altera and the persistent rumors throughout the industry of one chip maker or another being interested in acquiring Xilinx underscores the growing importance FPGAs in the datacenter. Most recently there has been speculation that acquisition-hungry Broadcom, after losing out on its $117 billion bid for Qualcomm, could turn its attention to other chip makers, including Xilinx.
But Xilinx has other plans, and in January named Victor Peng, a ten-year veteran with the company after positions within AMD, the FPGA maker’s president and CEO, succeeding Moshe Gavrielov. Less than two months later is now laying out a future for Xilinx that will enable it to build on and expand beyond its core FPGA business to better address the emerging trends that are roiling the industry, including the exponential growth of data, the rise of artificial intelligence (AI) and machine learning, and the need for more compute capabilities.
According to Peng, such trends require compute architectures that are heterogenous, fast and adaptable. The company today is unveiling a platform that the CEO says will form the foundation of such architectures, one that will go beyond what today’s CPUs and GPUs can do in terms of performance and performance-per-watt, particularly in the areas of big data and AI, including database, data compression, AI inference, search, video transcoding, machine vision and genomics.
Xilinx is introducing a new product category, the adaptive compute acceleration platform (ACAP), and the company’s first products coming out of it, codenamed “Everest.” The Everest program, built on Taiwan Semiconductor Manufacturing Corp.’s 7 nanometer process, has been in the works for more than four years at an R&D cost of about $1 billion, Peng said during an online briefing with journalists days before the announcement.
“This is a product that is both hardware and software programmable, so you get this very broad adaptability to multiple workloads and multiple use cases,” Peng said. “You get the benefit of acceleration and you still get some of the other flexibility in terms of the I/O, but overall, while it can get programmed at the hardware level, it has enough … architectural features that it can be programmed purely from a software perspective. That is fundamentally different from what we and our traditional competitor and other competitors in the FPGA space have done.
“You can change things not only from the software level but down to the hardware level, so really optimize and accelerate a very, very broad set of applications. And you can do that dynamically, meaning that while it’s running, you can actually be making these changes, and you will still get in many cases an exponential increase in performance relative to our CPUs. And unlike our tradition where we created the FPGA, rather than it being a very low-level hardware-programmable device, it’s something that can be programmed at the software level.”
The CEO declined to go into deep details of the ACAP, saying more information would be laid out in the coming months. Essentially an ACAP is an FPGA logic fabric that includes multiple levels of distributed memory, hardware-programmable digital signal processor (DSP) blocks, a multi-core system-on-a-chip (SoC), and one or more compute engines that are software programmable and hardware adaptable. Everything is connected via a low-latency, high frequency on-chip network that has arbitration and flow control. The platform also includes integrated RF-ADCs/DACs and will support in some cases a High-Bandwidth Memory (HBM) stack and various generations of DDR, and advanced SerDes technology, Peng said.
Xilinx is also aiming to create a platform that will attract software developers to write applications for it. They’ll will be able to use C/C++, OpenCL and Python to build applications for ACAP, which also will be reprogrammable at the RTL level using FPGA tools, Peng said. The company already has been courting developers over the past several years with steps taken in conjunction with its FPGAs, including with its SDAccel and SDSoC developer environments, where people can code in C and C++ to OpenCL, working with third parties to provide libraries targeting particular applications and domains, and developing interfaces for industry-standard frameworks for workloads like machine learning, such as Caffe and TensorFlow. Right now there are many more developers writing for CPUs and GPUs, and Xilinx is hoping to change that, Peng said.
With ACAP, Xilinx will be introducing a new architecture that is, as Peng notes, “really aligned to a lot of these throughput computing, still-want-some-good-latency kind of applications, and it will be at a level of software programmability – in a lot of cases embedded software programmability – but it will also be hardware programmable,” he said. “The message here being we’re not just hardening IP blocks that someone else is developing or that your standard SoC would do. We are always looking at how we can leverage our expertise in hardware programmability to get something that provides another level of optimization beyond just software programmable blocks, where it makes sense.”
Xilinx put many of its chief architects and lead technologists onto the Everest project initially; now it has about 1,500 engineers working on it. The plan is to roll out multiple products that will include devices that will scale to 50 billion transistors. The company is making early software tools available to some customers now, and Everest will tape out later this year. Customer shipments will begin in 2019. The company said Everest will deliver 20 times the performance on deep neural networks of Xilinx’s current 16 nanometer Virtex VU9P FPGAs, and the Everest-based 5G remote radio heads will have four times the bandwidth of the latest 16 nanometer radios.
Peng said the rapid innovation and changes in the industry and the demands from AI, big data analytics and other emerging workloads will overwhelm traditional CPUs and GPUs and will demand greater adaptability from compute architectures.
“We’re in the era of exponential change and the speed of innovation is really outpacing silicon design cycles, so even while Moore’s Law is stretching out and being more difficult, even if we’re staying the same, people need to get to market with new innovations [and] new business models leveraging the data that’s being generated on an exponential basis is really going to outpace what you can do with silicon anyway,” he said.
“The world of CPU-centric computing is over. It’s well understood that the future of computing is going to be more of a heterogeneous architecture where there will be accelerators somewhere in the system or the datacenter. It’s as much driven by the nature of the workload and the exponential increase in that, as well as the fact that Moore’s Law really has been slowing down if not completely stopped working. It’s certainly has stopped working from an economic perspective, even though we know how to scale to the next process node, from an economic perspective that everything is getting better, faster, cheaper, that’s really not working anymore, and actually hasn’t been working for a few nodes. What that means is that in this new era, architecture will be heterogeneous with accelerators.”