Intel Declares War on GPUs at Disputed HPC, AI Border

In Supercomputing Conference (SC) years past, chipmaker Intel has always come forth with a strong story, either as an enabling processor or co-processor force, or more recently, as a prime contractor for a leading-class national lab supercomputer.

But outside of a few announcements at this year’s SC related to beefed-up SKUs for high performance computing and Skylake plans, the real emphasis back in Portland seemed to ring far fainter for HPC and much louder for the newest server tech darlings, deep learning and machine learning. Far from the HPC crowd last week was Intel’s AI Day, an event in San Francisco chock full of announcements on both the hardware and software fronts during a week that has historically emphasized Intel’s revolving efforts in supercomputing.

As we have noted before, there is a great deal of overlap between these two segments, so it is not fair to suggest that Intel is ditching one community for the other. In fact, it is quite the opposite—or more specifically, these areas are merging to a greater degree (and far faster) than most could have anticipated. At first, and this was all just a year or so ago, deep learning communities were making HPC sexy again because model training platforms relied on GPU computing and other tricks honed by the supercomputing set. In just six months, however, that story evolved as it became clear HPC centers were thinking about ways their compute-centric simulations could get a machine learning boost, even on the same system (versus the web scale deep learning centers that require separate training and inference clusters). In short, investments in HPC are tied to AI platforms and vice versa, with just enough general purpose Xeon and specialized Knights Landing/Knights Hill thrown in to keep the CPU-only supers humming along.

On this note, just as Intel has had to go head-to-head with GPU maker, Nvidia, for HPC accelerator share with its first generation Xeon Phi product (a co-processor, which has given way to the self-hosted/non-offload Knights Landing appearing on its first wave of supercomputers this year), a new war with the same enemy is brewing. This is for machine learning share, an area where Nvidia’s GPUs dominate for the training portion of the growing workload set. While Intel has often proclaimed that they “power 97 percent of the datacenter servers running AI workloads” we have to take issue with that statement—not because it isn’t true, but because that means a CPU is always part of the mix in all of these workloads (training and inference alike) and given the lack of diversity in the CPU ecosystem, it’s natural that 97 percent is X86 from Intel. However, for training, the GPU is the real workhorse. And breaking out that percentage of value for that specific workload is more important, albeit a more difficult analysis.

To overcome its GPU rival, Intel is casting a wide net for the machine learning market. In addition to bolstering its current Knights line with Knights Mill, an inference-driven engine, with its FPGA assets from the bazillion dollars of IP it acquired from Altera last year, its existing general purpose Xeons, including the forthcoming Skylake, and its HPC-oriented Knights Landing/future Knights Hill parts, Intel has a secret weapon—and it is one that doesn’t fit the traditional CPU bill, either. That is the Nervana Systems purchase—one that we detailed in terms of architecture and value to Intel earlier this year. And if that isn’t convincing enough, look closely about how the tech leads at Nervana talked about GPUs in advance of the acquisition.

The point is, Intel has big plans to pull the mighty GPU down a few pegs in deep learning, saying they expect that a future mesh between a Xeon and the Nervana Engine—the startup’s own ASIC and Neon software stack for AI—will offer a 100X reduction in training time compared to GPUs, with those targeted GPUs being the future “Volta” GV100 graphics processors that we will see on supercomputers starting next year, including the Summit nodes we talked about today.

Intel loosely laid down plans to eventually integrate the unique Nervana architecture with Xeons this week, but was hesitant to put a year stamp on when that might happen. As it stands now, the chip giant will rely on Knights Mill for a stated 4X improvement in machine learning compared to current generation Xeon Phi, but the real jump with come with the integration of Nervana’s ASIC and Intel’s own unstated Xeons. As Intel’s VP and GM of the Xeon and Cloud Platforms Group, Jason Waxman, tells The Next Platform, “the roadmap for future products will be competitive with the roadmaps being projected for companies like Nvidia” and that even Knights Mill will “represent a significant gain over Pascal.”

“Knights Mill has direct access to large memory footprints with its DDR4 interface, which other platforms don’t have because they rely on add-in cards. This allows us to keep many active models in memory and do inference against them with very low latency. You can also do training for very large models on this platform, which is bootable so there is tension between the add-in cards and host memory and having to manage that interface,” Waxman says.

The future integration between Knights Mill (or another Xeon product) with the Nervana Engine is the subject of a more in-depth piece this week based on a conversation we had with former CEO of Nervana Systems (and now VP and GM of Intel’s AI Solutions Group), Naveen Rao. This eventual effort, called “Knights Crest” will snap into the “rack scale” architecture Intel keeps describing these days where custom architectures for specific workloads are pulled in as needed to create a more flexible cluster for a wide range of same-system workloads. Heck, for some users, this could mean looping in GPUs as needed, in addition to FPGAs and a variety of Xeons as required by workloads.

Ultimately, Intel is keeping on its toes after its late start in the AI game. While indeed, most datacenters running these workloads use Xeons for host processors and on the larger clusters, GPUs stole the show early on with big, vocal reference customers, including Baidu. The company also fell behind early on in the supercomputing world in terms of accelerators with GPUs coming onto the scene in 2010, then finding a home on the number one system, before Intel rounded up the efforts around its first Xeon Phi product, which like a GPU, sported an offload model. Now, however, with the self-hosted Knights Landing part that promises easier programming in a familiar X86 framework, the stakes are high again for both companies as they try to carve off a slice of two important markets—and yes, HPC is certainly still one of them, even if its swiftly meshing with some of the deep learning system and software mindsets.

Meanwhile, Nvidia is taking the view that deep learning supercomputers will be the new face of HPC. We saw this with the company’s placement of one of its own internal supercomputers on the Top 500, which is comprised of Infiniband-connected DGX-1 appliances. And if last year’s GTC was any indication (as well as its previous efforts to get CUDA into as many universities and national labs as possible), the GPU maker will continue its mad dash to push deep learning capabilities into existing applications with the addition of new CUDA hooks, libraries, and of course, investments in its training and outreach centers, as well as highly visible efforts that highlight the combined force of HPC and AI.

On a note of speculative balance, there are two things to keep in mind. Well, three if you want to talk about just how big Nvidia and Intel think the market for deep learning (versus more general purpose analytics/machine learning) really is, which we really don’t—at least not yet. First, it is wise and fortunate that deep learning came along to revive the GPU computing segment, at least at the high end. As noted in our Top 500 supercomputer analysis, the number of systems with GPU acceleration isn’t climbing much. The Tesla business at Nvidia counts on the HPC set significantly for both profits and mind share, and with that flat (on this list, at least—the Top 500 is only scratches the surface in terms of the real status of HPC systems out there), the company needs a refresh for its accelerators. As it turns out, that is certainly deep learning, and from what we can tell, they own the acceleration share in that area hands down. For now, of course.

Second, with the above notes in mind, remember that Intel’s strength even with its own accelerator, is that there is no funky programming model—it’s CUDA versus good old X86. And with Nvidia showing the way in terms of pushing down the precision (we will explore in greater detail this question how important double precision really is for most applications in the near future) and upping the memory, Intel is a definite threat to both deep learning and HPC business for Nvidia. GPUs opened the door to accelerated computing as a concept and while Intel wasn’t first, when they struck with their own offering, it was compelling—and still is. With Volta on the horizon to fit the HPC and AI bills out of Nvidia and Knights Mill and the Nervana architecture, both as individual products and eventually, as a combined force, the space will heat up further.

And so, really, there are now four things. Sorry about that. Remember that thing we keep writing about here at The Next Platform about the future of specialization? That general purpose processors will really only be used for legacy and run of-the-mill cloud workloads (which, by the way, will be significant for years to come)? We cannot discount that yet another wave will hit, offering a tsunami of unique, ultra-tailored architectures from those funded enough to get the $20-$50 million effort into real silicon. One real contender to upset the GPU/Intel war we’re talking about here, at least for deep learning, was Nervana Systems, which has been swiftly plucked from the market and retailored to fit the Intel bill. Others, including Wave Computing and countless startups with similar takes on where the market is heading are still floating freely, not a threat yet, but with potential to capture share if it takes too long for Intel to get a suitable product to market in time.

The Nervana Systems asset that Intel is sitting on (with 28 nm standalone/non-integrated products expected in mid-2017 is important, but it is a very specialized architecture. While it is an important piece of Intel’s AI strategy, we won’t be able to see how important it really is until we watch how Intel rolls its out and what adoption looks like (same with its FPGA strategy, for that matter). Pascal and Volta are general purpose architectures for the most part, and have a clear roadmap associated, and this is a plus in Nvidia’s direction. It’s too early to tell on the product front, and truth be told, still early to tell on just how big this market will get.

Tomorrow we will share highlights from a chat we had with Naveen Rao about what an integrated AI product from Nervana and Intel might look like and where its advantages are. And true to understanding what’s happening on both sides of the border, we’ll also share insights from some questions we put to the CEO of Nvidia, Jen-Hsun Huang during SC16 week.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

7 Comments

  1. Well. Then make a graphics card that I can buy in a box and at least wins over AMD. If Intel can’t do that then it’s needs to just shut up and let the big boy adults do their job.

    • Intel isn’t interested in graphics. They never was and they know there is no money any more to be made as the market is saturated. nVidia is as desperate to extend their market and they have two things in side, automotive and deep learning. While they got there early not by their own btw. They will eventually loose that fight same as Intel did in the past. Simple because they have the same probablem they need an architecture that fills many gaps. Graphics, HP simple parallel processing and now the matrix crunching in DeepLearning which is more a less a by product of the former. But it is still a very general architecture that tries to taylor to all of these and eventually something has to give. DeepLearning alone doesn’t provide enough revenue to keep such a massive GPU architecture going but if nVidia changes it too much to taylor to that sector they will loose the flexibility for graphics. So they are in a very tricky hole. So far they haven’t done anything very specific DeepLearning-wise only. (faster 16bit processing is good for graphics as well) But eventually they will have to if they want to squeeze more performance out.

      But these changes in architecture are very unlikely to benefit anything outside of that application realm.

      Intel sits in a better position. As you already need a CPU anyway for your machine.

    • The consumer graphics cards that you can buy from Nvidia have much less FP compute dollar for dollar and pound for pound relative to AMD’s Polaris lower priced offerings. One need only to simply look at what the bitcoin miners are using to crunch their newest bitcoin and the RX 480 has plenty of customers. AMD are very popular for their extra FP compute and 2 RX 480s(at around $475 for 2 cards) have about the same Single Precision Floating Point compute as a Titan X(Pascal at $1600). The RX 480 has more compute using lower clocks relative to any of Nvidia’s comparable priced GPU SKUs so the bitcoin miners will be using plenty of RX 480s in their mining rigs to crunch any of the newer bitcoin algorithms not yet implemented in ASIC form. Simply throwing up a bunch of gaming graphics Frames Per Second benchmarks and declaring a winner does not relate to or indicate any non gaming compute metrics from a GPU. The Floating Point metrics is what indicates more of a GPUs potential for number crunching and other non gaming workloads.

      This is not a Game for GPUs used in this manner this is some real productive computational workload usage that can use all of that FP Gigaflops of compute and it’s the Gigaflops/Dollar metric that makes AMD’s consumer SKUs so popular for some non graphics workloads.

      Let’s not let the gaming market mentality have to much weight in the GPU accelerator market and look at other GPU usage metrics before declaring a winner. Expect that AMD’s Radeon Technologies Group will have its competing professional GPU accelerator SKUs to market along side AMD’s Zen server/HPC/Workstation CPU SKUs and more competition it the non consumer markets also.

      AMD/RTG will also Offer a more friendly Open Sourced software and middleware based solutions that will not lock-in any future customers on the software side of the compute portion of that CPU/GPU professional market.

  2. > is that there is no funky programming model—it’s CUDA versus good old X86

    Can you ask some developers that have worked with Intel MIC if that is actually true? I know Intel says that, but I think I’ve seen some developers say it’s not actually true, and that you still need some custom programming.

    • When you want maximum performance you need to write some hand-tailored code for XeonPhi as it has wider SIMD and different latency also threading and memory management you need to control.

      But if you are not into maximum / optimal performance any general x86 code runs out of the box on KL.

      While on CUDA you really need to write the whole thing new to get it even working and that includes the boiler-plate code along side your application to actually invoke and get the results back into your application

    • @witeken. The combo prototype of xeon+arria was presented at Shenzen China Intel DF in April. The link you provided is the new GPU prototype based on FPGA. Inspur has already some, so let’s wait to see the results.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.