The long wait for volume shipments of Intel’s “Knights Landing” parallel X86 processors is over, and at the International Supercomputing Conference in Frankfurt, Germany is unveiling the official lineup of the Xeon Phi chips that are aimed at high performance computing and machine learning workloads alike.
The lineup is uncharacteristically simple for a Xeon product line, which tends to have a lot of different options turned on and off to meet the myriad requirements of features and price points that a diverse customer base usually compels Intel to support. Over time, the Xeon Phi lineup will become more complex, with the addition of versions of the processor that have integrated Omni-Path networking ports on the Knights Landing package, and eventually Intel will deliver versions of the chip that slide into PCI-Express peripheral slots like the previous “Knights Corner” generation of Xeon Phi coprocessors do. It remains to be seen how much demand there will be for these coprocessors compared to the directly bootable versions that Intel is currently shipping.
Intel revealed the basic architecture of the Knights Landing chips back in March 2015, and lifted the veil a bit further about more of the specs on the chip in November last year as it started shipping pre-production silicon to Cray and early adopter customers Sandia National Laboratory in the United States and CEA in France. We will give you a brief review, but drill down into those stories for a deeper look into the feeds and speeds.
The Knights Landing chip is etched in 14 nanometer manufacturing processes, like the current “Broadwell” Xeon E5 and E7 processors used in servers, and with over 8 billion transistors, it is the largest chip that Intel has ever made.
The Knights Landing die actually has 76 cores on the die, but to increase the yield on chip manufacturing process, Intel’s is only shooting for chips that have 72 cores as the top-bin part and, as it turns out, will be offering versions with lower clock speeds and core counts to have some variation in the SKU stack and therefore providing different price points. The cores on the Knights Landing chip are based on a heavily modified “Silvermont” Atom core that has four threads on it. These are tiled in pairs, with each core having two AVX512 512-bit vector processing units and 1 MB of L2 memory shared across the tile. The tiles are linked to each other using a 2D mesh interconnect, which also hooks into the two DDR4 memory controllers that feed into so-called far memory, which scales up to 384 GB capacity and which delivers around 90 GB/sec of bandwidth on the STREAM Triad memory benchmark test. That 2D mesh also hooks the cores into eight chunks of high bandwidth stacked MCDRAM memory, which scales up to 16 GB of capacity; this is known as the “near memory” in the Knights Landing chip and offers more than 400 GB/sec of bandwidth to keep those cores well fed. Intel has a number of different modes of memory addressing in the Knights Landing processor, including using the combined memory as a single address space or using the MCDRAM as a L3 cache for the DRAM memory. Add it all up, and Intel has promised that a Knights Landing processor would deliver more than 6 teraflops of single precision and more than 3 teraflops of double precision floating point performance.
The Knights Landing chip embodies a number of firsts for Intel, including the first integrated high speed main memory on any class of Xeon processor, the first on-package integrated interconnect, and the first bootable implementation of what has been, up until now, a coprocessor that had to be tied to a regular Xeon chip to do any useful work.
As far as we know, Intel intended to offer Knights Landing processor packages and coprocessor cards with either 8 GB or 16 GB of MCDRAM, and we learned last fall from Charles Wuischpard, general manager of the HPC Platform Group within Intel’s Data Center Group, that Intel had planned to offer a version of Knights Landing that did not include any MCDRAM at all, and with the express intent of offering a processor that would scream on certain benchmarks, including the Linpack Fortran matrix math test that is used to rank the world’s most powerful HPC systems.
In a briefing ahead of the ISC16 event, Wuischpard said that Intel had reconsidered its SKU plans with Knights Landing.
“All of the memory is 16 GB across the board, and we had originally talked about having a much richer matrix of SKUs ranging from no MCDRAM memory to 16 GB,” Wuischpard explained in a briefing with The Next Platform. “It was just too busy and too complex, and in the end, everyone wanted the on-package memory for the bandwidth and performance benefits. So we decided to shrink the SKU stack and make it easier to understand. It also makes it easier from a manufacturing standpoint to just populate everything with 16 GB.”
Obviously if customers want other Knights Landing variants, Intel has shown itself to be perfectly happy to make accommodations with custom parts with regular Xeon server chips and would likely do so with Xeon Phi processors, too.
There are, as it turns out, four different versions of the Knights Landing processor that Intel is launching this week, and four more will be coming out in October with integrated dual-port, 100 Gb/sec Omni-Path ports on the Xeon Phi package. Intel will eventually offer coprocessor versions of the Knights Landing chip that plug into PCI-Express 3.0 slots inside servers, but the precise configurations of these have not been revealed and their timing has been pushed out as far as we know. Here is the Knights Landing lineup and a comparison between the older Knights Corner coprocessors:
This, quite frankly, is the moment that we have been waiting for. As you can see from the table above, there are four different SKUs of the Knights Landing standalone processor that Intel is shipping now. Intel is varying the core counts, clock speeds, and memory transfer rates into and out of the MCDRAM a bit to yield different performance points. The Xeon Phi 7290 is what Wuischpard called the “Formula 1” version of the card, which has all 72 cores on the die running at 1.5 GHz and delivering 3.46 teraflops of peak theoretical double precision performance. This chip has a thermal design point of 245 watts and carries a single unit price of $6,254 when bought in 1,000-unit quantities.
“There will be some buyers for the Xeon Phi 7290, but it is a premium product and it is relatively low yielding so there will not be great supplies,” Wusichpard said. “Most of our early customers, including the large HPC research labs, are really focused on the 7230 and 7250, and in the HPC world, they tend to take top bin minus one to get the best price/performance. We actually think that the 7210 will be the more general purpose, high volume part, and you will notice that we priced it accordingly. You will get about 80 percent of the performance of the high-end Knights Landing at about 40 percent of the cost.”
In September, Intel will ship the variants of these four Knights Landing parts with on-package Omni-Path 100 series ports. Adding the Omni-Path links to the chip package boosts the price by $278 and raises the thermal design point by another 15 watts. Intel could eventually offer lower bin chips in the Knights Landing at even lower prices with lower core counts, but maybe not because the overlap with the Xeon CPUs might be too great.
The performance jump from the Knights Corner coprocessors to the Knights Landing processors for the most similar SKUs between the two lines (realizing we are comparing bootable chip with a coprocessor) ranges from somewhere between 2.6X and 2.9X with the price only rising by 1.4X to 1.5X. Hence, the bang for the buck on the Knights Landing has increased by a factor of two, almost precisely and not coincidentally we think. The cost per teraflops (at peak theoretical double precision) with the Xeon Phi 7290 Knights Landing processor, for example, is $1,810, compared to $3,412 for the high end Xeon Phi 7120P or 7120X PCI-Express cards based on Knights Corner. This is about what you would expect for product that does not have a tick-tock cadence but rather does an architecture change and a process shrink every three or four years.
The performance customers get with a Knights Landing processor is going to vary depending on the nature of the application and the dataset. As we have previously detailed, the Knights Landing chip has respectable single threaded performance compared to the prior Knights Corner. Here are some early results that Intel is showing on various benchmark tests against GPU accelerated systems:
We will be gathering up information on these benchmarks above and system pricing for the machines used to run the tests to compare Xeon Phi and CPU-GPU hybrids on various workloads to get a system-level price/performance metric. While Nvidia’s “Pascal” GPU accelerators offer higher performance, they also carry higher prices and at the system level it may not be as big of a difference as many think. (We shall see.)
The thing is that Intel is now apparently able to ship Knights Landing processors in volume and will be shipping accelerators (presumably with lower prices still) at some point in the future.
Wuischpard said that Intel expects to ship more than 100,000 Xeon Phi units this year into the HPC market, and there is a good chance that more than a few hyperscalers are going to buy a bunch, too, for machine learning and possibly other workloads. More than 30 software vendors have ported their code to the new chip, and others will no doubt follow. And more than 30 system makers are bending metal around the Knights Landing processors.
We also hear that Intel may do something special with Knights Landing for the machine learning crowd from a keynote address at the ISC 2016 conference today, so stay tuned for that. [Editor’s note: This rumor did not pan out.]
The single-socket Knights Landing processor is compatible with both the Linux and Windows Server operating systems that dominate datacenters today, and indeed any application that has been certified to run on either can run on a Knights Landing. That opens up the market for them pretty wide, too.
Now we find out how customers will use it.