Why Hyperscalers And Clouds Are Pushing Intel Into FPGAs
July 29, 2015 Timothy Prickett Morgan
It has been almost two months since Intel announced its blockbuster $16.7 billion deal to acquire FPGA maker Altera, which will allow the world’s largest chip maker to move from fixed function into programmable devices and potentially shake up the entire spectrum of computing, from handhelds all the way to datacenters. The deal, which is the biggest in Intel’s history, has a lot of people still scratching their heads, wondering what Intel sees in the future that was not already obvious to the many companies who already use FPGAs in myriad applications.
Our contention from the beginning, when rumors first appeared back in March about a possible acquisition of Altera, was that Intel saw some big sea change coming to the datacenter, one so large that it was willing to spend an enormous sum of money – by Intel’s historical standards and compared to its capital expense budget to keep Moore’s Law moving ahead in its fabs.
To our thinking, when the deal was announced in early June, Intel was hedging its bets on the future of compute and possibly saw a flowering of accelerated computing based on FPGAs, much as Nvidia has cultivated with nearly a decade of hard work for GPU acceleration. Many people thought that FPGAs would be either compute engines or accelerators in the datacenter a decade ago, and while they have their niches, they are far from commonplace compute elements. They are still mostly used for specific function acceleration or to implement circuits where the volumes are insufficient to justify the cost of designing and etching a full-on, static ASIC.
Intel sees a much larger opportunity than this, and CEO Brian Krzanich said when the deal was announced that up to a third of cloud service providers could be using hybrid CPU-FPGA server nodes for their workloads by 2020. This is an astounding statement, given that Altera itself pegged the FPGA opportunity in the datacenter at something around $1 billion in its own forecasts from late 2014. That’s about three times the current revenue run rate for Nvidia’s Tesla compute engines. Intel showed a prototype Xeon-FPGA chip that put the two devices on the same package back in early 2014, and the plan was to get it out the door by the end of 2016 with a ramp through 2017; the idea was to get a Xeon with FPGA circuits on the die “shortly after that,” as Data Center Group general manager Diane Bryant put it at the time. On the call announcing the Altera deal, Krzanich did not say anything about the timing of this Xeon-FPGA device, but did say that Intel would create a hybrid Atom-FPGA device aimed at the Internet of Things market that would be a monolithic die; Intel is examining if it needs to do a single-package hybrid in the interim based on Atoms and Altera FPGAs.
Not surprisingly, FPGAs were the hot topic of conversation when Jason Waxman, general manager of Intel’s Cloud Infrastructure Group, participated in a conference call to talk about Intel’s datacenter business with the research analysts at Pacific Crest Securities. First off, Waxman confirmed that Intel is already sampling that hybrid Xeon-FPGA compute engine to key cloud service providers, although he did not name names or give out any of the specs on the device.
Importantly, Waxman spoke at length and clarified what is driving Intel to acquire Altera and jump into programmable computing devices. And Intel clearly wants to make FPGAs more mainstream, even if that might cannibalize some of its Xeon business in the datacenter. (We think because Intel believes that such cannibalization is inevitable, and the best way to control it is to make FPGAs part of the Xeon lineup.)
“We see a path to accelerating machine learning, to accelerate storage encryption, to accelerate network functions. We know because we are very deep into those workloads and we now see the opportunity to do it. Now, FPGAs have traditionally been kind of difficult, limited to the far-out expertise, because you are writing RTL. We are a company that writes RTL all the time, so we can solve that problem.”
“I think there are a number of things that can go into the acquisition, and a number of them are beyond the Data Center Group,” Waxman said. “One is that there is a an underlying core business that tends to be driven by manufacturing lead advantage, and we seem to have a pretty good handle on that, so there seems to be some good synergy there. There is also the Internet of Things Group that has a strong interest as well. But for us, one of the things that we started to see is that with the expansion of workloads that are done at massive scale – something like machine learning, certain network functions – there is increasingly interest in seeing how you, if you are doing it at scale, get a higher degree of performance. So we are on the cusp of realizing that if we can get some breakthroughs in performance, we can potentially take an FPGA from something that is a niche today in datacenter applications to something that is much more broad, and we see this as a great opportunity. In the Data Center Group, the synergy we see is taking the FPGA and making it a companion to the CPU and expanding our silicon footprint and being able to solve problems for cloud service providers and other types of large-scale applications.”
The key applications where Intel thinks there is initial and presumably large demand for FPGA acceleration include machine learning, search engine indexing, encryption, and data compression. These tend to be very targeted use cases, not general purpose ones, as Waxman pointed out. These are the workloads that Krzanich was no doubt referring to when he said that a third of cloud service providers would be using FPGA acceleration within five years.
Everyone has been lamenting how difficult it is to program FPGAs, but Intel is not daunted by this, and without revealing too much about Intel’s plans, he did offer some insight into why and what possible actions it might take to make FPGAs more accessible.
“I think the thing that we have that is unique, that other people would not be able to go deliver, is the ability to understand those workloads and to be able to drive acceleration,” said Waxman.”We see a path to accelerating machine learning, to accelerate storage encryption, to accelerate network functions. We know because we are very deep into those workloads and we now see the opportunity to do it. Now, FPGAs have traditionally been kind of difficult, limited to the far-out expertise, because you are writing RTL. We are a company that writes RTL all the time, so we can solve that problem. We can make it performant and we can lower that barrier to entry. The third piece is really the volume economics, and that is all about integration and manufacturing prowess. So we look at the barriers that have kept it a niche, and we have a path to overcome those barriers. We have some interesting plans and if things go well, we can talk about those at another time.”
For those of us who think that Intel is conceptualizing this as FPGAs replacing Xeons, Waxman put the kibosh on that idea entirely.
“The workload or application is going to run on a CPU, and there will be an algorithm or a piece of it that will go on the FPGA,” he said. “So you are not going to run the entire application on an FPGA, and that is one of the things that I think sometimes people are wondering about. Is this thing a replacement for a Xeon CPU? It is really not, it is a companion to the CPU. Take image recognition. How does a computer identify the picture of a cat on Facebook – that’s the funny example. There is a lot of compute that goes behind that, but the actual application for machine learning runs on a Xeon CPU but there are certain algorithms that you are going to want to offload to an FPGA.”
Any algorithms that need to be done repetitively and at high rates, are a natural for FPGAs, said Waxman, and we would add that any data manipulation or transformation that needs to be done at extreme low latency or on the wire is also a candidate.
Considering that Altera already makes system-on-chips that incorporate ARM processors and FPGAs, it is natural to think that Intel might be tempted to global replace ARM cores with X86 cores and do similar devices. But it doesn’t look like this will happen. First, on a call last week going over Intel’s financial results for the second quarter, Krzanich said that Intel was committed to supporting and enhancing these ARM-FPGA hybrids for Altera’s existing customers.
“I think the way that we view it is that we would actually be integrating some form of FPGA into a Xeon,” Waxman clarified even further. “We have talked publicly about doing a first generation in one package, but the way we will look at it going forward, depending on how things progress, would be on the same die. So we will be looking at what is the right combination based on the customer feedback. And by the way, I would still expect to see there will be some systems where there won’t be integration, they will still do a system-level companion. We are not going to integrate every possible combination of Xeon with FPGAs. That would be prohibitive and we will find the right targets and balance in the market.”
While Altera’s toolset makes use of the OpenCL programming model to get application code converted down to RTL, the native language of the FPGA, interestingly Intel does not think that the future success of FPGAs in the datacenter is predicated on improvements in OpenCL integration with RTL tools or more widespread adoption of OpenCL.
“It is not predictated on OpenCL,” Waxman said emphatically. “We do see OpenCL as a potential avenue that further broadens the applicability of FPGAs, but right now initial cloud deployments of FPGAs will probably be done by the more capable companies and none of them are asking us for OpenCL.”
Intel has plans to make it easier to program FPGAs, but Waxman was not at liberty to talk about them. He did hint, however, that what Intel could do is make an RTL library available to programmers so they could call routines deployed on FPGAs, pushing it down to form the gates that implement the application routines on the gates, rather than have them create those routines by themselves. This makes a certain amount of sense, and this is precisely what Convey, which is part of Micron Technology now, did with its FPGA-accelerated systems.
What About GPU Accelerators?
While Intel does not sell discrete GPU accelerators, it does sell GPUs in conjunction with low-end Xeon E3 processors and these can and are used as parallel accelerators for certain workloads, although Intel doesn’t talk about it much.
And, to a certain way of looking at it, the massively parallel Xeon Phi family of coprocessors – and soon to be discrete processors with the “Knights Landing” generation – have their heritage in Intel’s efforts to get into the discrete GPU business using an X86 architecture as the engine for that GPU. In effect, Intel dropped the GeForce part because of the difficulty of taking on AMD and Nvidia in discrete GPUs and went straight to accelerated computing portion with the Knights family of chips.
Waxman had an interesting thesis on how GPU acceleration was different from FPGA acceleration, and one that IBM and Nvidia no doubt will argue with. But it is interesting all the same:
“I think there is a continuum of acceleration,” he said. “And what happens is, in the beginning, you may not know exactly what you are trying to accelerate and you are experimenting a little bit, and in that phase of acceleration, you want something that is a little more general purpose. As you start to really home in on what you are trying to accelerate, you are going to want something that is more efficient, that has lower power and takes less space, and that is when you are going to move into an FPGA.”
Waxman then cited the work that Microsoft has done with FPGA acceleration on its “Catapult” system, which takes its Open Cloud Server and adds FPGA mezzanine cards as accelerators. We went over this research back in March, which shows how an FPGA device at 25 watts delivers better performance/watt than a set of servers using Nvidia Tesla K20 GPU accelerators at 235 watts that were tested by Google running the same image recognition training algorithms.
As we have pointed out, we have no doubts about the performance numbers that Microsoft and Google posted, but applying performance to the discrete GPU or FPGA and gauging that against its own thermal profile is not fair. You have to look at this at the server node level, and if you do that, the FPGA-assisted Microsoft server at the system level is only moderately ahead of the servers Google tested using Tesla K20s. (Those were our estimates, based on images processed per second per watt.) And this Microsoft comparison does not take cost into account, and it should. What can be honestly said is that Microsoft’s Open Cloud Server does not have the juice or the cooling to use full-on Tesla GPUs. A real bakeoff would use GPU mezzanine cards somehow and include thermals, performance, and price.
But Waxman’s larger point in the discussion remains the same.
“At some point, you are really going to want that thing to scream, and you are going to want to do that in a much lower power envelope. That is what we are banking on – that more optimized approach is where an FPGA is going to pay off.”
The last thing to consider is that cloud business at Intel. These customers now represent about 25 percent of Data Center Group’s revenue and in the aggregate their purchases are growing at about 25 percent per year. The overall Data Center Group business is projected to grow at 15 percent this year and into the next couple of years. Let’s do some math. Intel should post $16.6 billion in revenues in the Data Center Group this year if its plan works out. That’s around $4.1 billion for the cloud service providers (which includes cloud builders and hyperscalers using our language here at The Next Platform), and around $12.5 billion for the rest of Intel’s datacenter sales. So outside of the cloud, Intel’s business is growing at about 12 percent, or half the cloud rate. Intel needs to feed that cloud growth any way it can, and apparently FPGA capacity, even if it does cannibalize Xeon capacity a bit, is a better option for Intel than having GPU acceleration continue to grow as it has.