Intel Declares War on GPUs at Disputed HPC, AI Border
November 20, 2016 Nicole Hemsoth
In Supercomputing Conference (SC) years past, chipmaker Intel has always come forth with a strong story, either as an enabling processor or co-processor force, or more recently, as a prime contractor for a leading-class national lab supercomputer.
But outside of a few announcements at this year’s SC related to beefed-up SKUs for high performance computing and Skylake plans, the real emphasis back in Portland seemed to ring far fainter for HPC and much louder for the newest server tech darlings, deep learning and machine learning. Far from the HPC crowd last week was Intel’s AI Day, an event in San Francisco chock full of announcements on both the hardware and software fronts during a week that has historically emphasized Intel’s revolving efforts in supercomputing.
As we have noted before, there is a great deal of overlap between these two segments, so it is not fair to suggest that Intel is ditching one community for the other. In fact, it is quite the opposite—or more specifically, these areas are merging to a greater degree (and far faster) than most could have anticipated. At first, and this was all just a year or so ago, deep learning communities were making HPC sexy again because model training platforms relied on GPU computing and other tricks honed by the supercomputing set. In just six months, however, that story evolved as it became clear HPC centers were thinking about ways their compute-centric simulations could get a machine learning boost, even on the same system (versus the web scale deep learning centers that require separate training and inference clusters). In short, investments in HPC are tied to AI platforms and vice versa, with just enough general purpose Xeon and specialized Knights Landing/Knights Hill thrown in to keep the CPU-only supers humming along.
On this note, just as Intel has had to go head-to-head with GPU maker, Nvidia, for HPC accelerator share with its first generation Xeon Phi product (a co-processor, which has given way to the self-hosted/non-offload Knights Landing appearing on its first wave of supercomputers this year), a new war with the same enemy is brewing. This is for machine learning share, an area where Nvidia’s GPUs dominate for the training portion of the growing workload set. While Intel has often proclaimed that they “power 97 percent of the datacenter servers running AI workloads” we have to take issue with that statement—not because it isn’t true, but because that means a CPU is always part of the mix in all of these workloads (training and inference alike) and given the lack of diversity in the CPU ecosystem, it’s natural that 97 percent is X86 from Intel. However, for training, the GPU is the real workhorse. And breaking out that percentage of value for that specific workload is more important, albeit a more difficult analysis.
To overcome its GPU rival, Intel is casting a wide net for the machine learning market. In addition to bolstering its current Knights line with Knights Mill, an inference-driven engine, with its FPGA assets from the bazillion dollars of IP it acquired from Altera last year, its existing general purpose Xeons, including the forthcoming Skylake, and its HPC-oriented Knights Landing/future Knights Hill parts, Intel has a secret weapon—and it is one that doesn’t fit the traditional CPU bill, either. That is the Nervana Systems purchase—one that we detailed in terms of architecture and value to Intel earlier this year. And if that isn’t convincing enough, look closely about how the tech leads at Nervana talked about GPUs in advance of the acquisition.
The point is, Intel has big plans to pull the mighty GPU down a few pegs in deep learning, saying they expect that a future mesh between a Xeon and the Nervana Engine—the startup’s own ASIC and Neon software stack for AI—will offer a 100X reduction in training time compared to GPUs, with those targeted GPUs being the future “Volta” GV100 graphics processors that we will see on supercomputers starting next year, including the Summit nodes we talked about today.
Intel loosely laid down plans to eventually integrate the unique Nervana architecture with Xeons this week, but was hesitant to put a year stamp on when that might happen. As it stands now, the chip giant will rely on Knights Mill for a stated 4X improvement in machine learning compared to current generation Xeon Phi, but the real jump with come with the integration of Nervana’s ASIC and Intel’s own unstated Xeons. As Intel’s VP and GM of the Xeon and Cloud Platforms Group, Jason Waxman, tells The Next Platform, “the roadmap for future products will be competitive with the roadmaps being projected for companies like Nvidia” and that even Knights Mill will “represent a significant gain over Pascal.”
“Knights Mill has direct access to large memory footprints with its DDR4 interface, which other platforms don’t have because they rely on add-in cards. This allows us to keep many active models in memory and do inference against them with very low latency. You can also do training for very large models on this platform, which is bootable so there is tension between the add-in cards and host memory and having to manage that interface,” Waxman says.
The future integration between Knights Mill (or another Xeon product) with the Nervana Engine is the subject of a more in-depth piece this week based on a conversation we had with former CEO of Nervana Systems (and now VP and GM of Intel’s AI Solutions Group), Naveen Rao. This eventual effort, called “Knights Crest” will snap into the “rack scale” architecture Intel keeps describing these days where custom architectures for specific workloads are pulled in as needed to create a more flexible cluster for a wide range of same-system workloads. Heck, for some users, this could mean looping in GPUs as needed, in addition to FPGAs and a variety of Xeons as required by workloads.
Ultimately, Intel is keeping on its toes after its late start in the AI game. While indeed, most datacenters running these workloads use Xeons for host processors and on the larger clusters, GPUs stole the show early on with big, vocal reference customers, including Baidu. The company also fell behind early on in the supercomputing world in terms of accelerators with GPUs coming onto the scene in 2010, then finding a home on the number one system, before Intel rounded up the efforts around its first Xeon Phi product, which like a GPU, sported an offload model. Now, however, with the self-hosted Knights Landing part that promises easier programming in a familiar X86 framework, the stakes are high again for both companies as they try to carve off a slice of two important markets—and yes, HPC is certainly still one of them, even if its swiftly meshing with some of the deep learning system and software mindsets.
Meanwhile, Nvidia is taking the view that deep learning supercomputers will be the new face of HPC. We saw this with the company’s placement of one of its own internal supercomputers on the Top 500, which is comprised of Infiniband-connected DGX-1 appliances. And if last year’s GTC was any indication (as well as its previous efforts to get CUDA into as many universities and national labs as possible), the GPU maker will continue its mad dash to push deep learning capabilities into existing applications with the addition of new CUDA hooks, libraries, and of course, investments in its training and outreach centers, as well as highly visible efforts that highlight the combined force of HPC and AI.
On a note of speculative balance, there are two things to keep in mind. Well, three if you want to talk about just how big Nvidia and Intel think the market for deep learning (versus more general purpose analytics/machine learning) really is, which we really don’t—at least not yet. First, it is wise and fortunate that deep learning came along to revive the GPU computing segment, at least at the high end. As noted in our Top 500 supercomputer analysis, the number of systems with GPU acceleration isn’t climbing much. The Tesla business at Nvidia counts on the HPC set significantly for both profits and mind share, and with that flat (on this list, at least—the Top 500 is only scratches the surface in terms of the real status of HPC systems out there), the company needs a refresh for its accelerators. As it turns out, that is certainly deep learning, and from what we can tell, they own the acceleration share in that area hands down. For now, of course.
Second, with the above notes in mind, remember that Intel’s strength even with its own accelerator, is that there is no funky programming model—it’s CUDA versus good old X86. And with Nvidia showing the way in terms of pushing down the precision (we will explore in greater detail this question how important double precision really is for most applications in the near future) and upping the memory, Intel is a definite threat to both deep learning and HPC business for Nvidia. GPUs opened the door to accelerated computing as a concept and while Intel wasn’t first, when they struck with their own offering, it was compelling—and still is. With Volta on the horizon to fit the HPC and AI bills out of Nvidia and Knights Mill and the Nervana architecture, both as individual products and eventually, as a combined force, the space will heat up further.
And so, really, there are now four things. Sorry about that. Remember that thing we keep writing about here at The Next Platform about the future of specialization? That general purpose processors will really only be used for legacy and run of-the-mill cloud workloads (which, by the way, will be significant for years to come)? We cannot discount that yet another wave will hit, offering a tsunami of unique, ultra-tailored architectures from those funded enough to get the $20-$50 million effort into real silicon. One real contender to upset the GPU/Intel war we’re talking about here, at least for deep learning, was Nervana Systems, which has been swiftly plucked from the market and retailored to fit the Intel bill. Others, including Wave Computing and countless startups with similar takes on where the market is heading are still floating freely, not a threat yet, but with potential to capture share if it takes too long for Intel to get a suitable product to market in time.
The Nervana Systems asset that Intel is sitting on (with 28 nm standalone/non-integrated products expected in mid-2017 is important, but it is a very specialized architecture. While it is an important piece of Intel’s AI strategy, we won’t be able to see how important it really is until we watch how Intel rolls its out and what adoption looks like (same with its FPGA strategy, for that matter). Pascal and Volta are general purpose architectures for the most part, and have a clear roadmap associated, and this is a plus in Nvidia’s direction. It’s too early to tell on the product front, and truth be told, still early to tell on just how big this market will get.
Tomorrow we will share highlights from a chat we had with Naveen Rao about what an integrated AI product from Nervana and Intel might look like and where its advantages are. And true to understanding what’s happening on both sides of the border, we’ll also share insights from some questions we put to the CEO of Nvidia, Jen-Hsun Huang during SC16 week.