Like many system architects the world over, we had high hopes for the 3D XPoint variant of phase change memory (PCM) when it launched with much fanfare back in July 2015 after being developed jointly by Intel and Micron Technology for many, many years. This represented a new kind of memory in the system hierarchy, sitting somewhere between main memory and Tier 1 storage, with the promise of making main memory bigger and cheaper and Tier 1 storage faster.
The system memory hierarchy has some pretty big gaps, and the one between Tier 1 storage and main memory in the CPUs is the biggest one, therefore affecting overall system performance and the cost of a system dramatically. But perhaps the most important gap that 3D XPoint had to jump is the one from the lab and foundry that Intel and Micron jointly paid for in Lehi, Utah to the datacenters of the world.
That gap has proved much harder to bridge, and that is one reason why Micron has announced that it is shutting down its 3D XPoint business and selling off the Lehi fab. We presume that since Intel is the only company selling 3D XPoint and that the Lehi fab, which Micron took sole ownership of a few years ago after the second generation of this PCM memory variant was perfected, that Intel will buy the fab from Micron. But maybe not. Micron is not saying.
And with Intel on the verge of getting “Ice Lake” Xeon SP processors out the door with eight memory channels instead of the six available on Xeon processors for many, many years, some of the memory capacity and memory bandwidth heat that has been on Intel – and that has helped drive adoption of Optane DIMMs on the past two generations of Xeon SP servers – will abate. This timing is perhaps not a coincidence. But Micron’s sudden and complete reaction is a bit of a surprise.
In a conference call with Wall Street, Micron president and chief executive officer, Sanjay Mehrotra, said that Micron would cease development of 3D XPoint immediately and focus on bringing to market memory products that utilize the Compute Express Link (CXL) protocol to link CPUs to physically external (but logically connected) storage and accelerator devices.
“We believe this shift will better address our customers’ needs and, importantly, improve returns for our shareholders,” Mehrotra explained. “Our decision was driven by our assessment of the 3D XPoint market opportunity in light of the expected impact of CXL and our new emerging memory products on the future datacenter.”
That 3D XPoint has not taken off as expected is no state secret, and we have said many times that Intel’s Optane flavor came out later to market than expected, was slower than expected, less capacious than expected, and more expensive than expected. Which is not a good thing. That said, Optane SSDs are significantly faster than flash SSDs, and Optane PMEM sticks are persistent, unlike DDR DIMMs which need to have power on to remember things.
One of the limiting factors for Micron, which owns the Lehi fab where 3D XPoint memory is made, is that the underutilization of the foundry was impacting its non-GAAP profits to the tune of $400 million a year. To give you some perspective, Micron had $21.44 billion in sales in fiscal 2020 ended in September of that year and had an operating income of $1.16 billion and a net income of $988 million. So this shortfall in 3D XPoint sales is a big hit to profits at the middle and bottom line at Micron.
But we think the situation is even more complex. Intel had hoped to have 3D XPoint out with its “Skylake” processors and “Purley” server platforms, which we told you all about back in May 2015 shortly after The Next Platform itself launched. We did not know what Intel was planning precisely back then, but the documents we got our hands on did say that the Purley systems would have up to 4X the capacity and lower cost than DRAM memory and have persistent data on devices that were 500X faster than NAND flash of the time. This was the “Apache Pass” Optane DIMMs, which we know as the Optane 100 Series and which were delayed until the “Cascade Lake” Xeon SPs were delivered in April 2019. That was almost four years of waiting from the debut of 3D XPoint to its deliver in DDR DIMM form factors that were crucial, and even then, only Intel servers and only Cascade Lake and then follow-on “Cooper Lake” four-socket and eight-socket servers could use them.
System architects and their CIO customers (or bosses, as the case may be) are leery of any technology that is single sourced and cannot be deployed across a wide variety of CPUs and motherboards. So the lack of enthusiasm for Optane DIMMs is, we think, all Intel’s own fault. Intel needed some bullet points for Cascade Lake and Cooper Lake processor launches, which would have diminished memory capacity and bandwidth compared to the AMD Epyc 7002, IBM Power9, Ampere Altra and Altra Max, and Marvell ThunderX2 alternatives in the mainstream server market because “Ice Lake” Xeon SPs were years late thanks to delays in Intel’s 10 nanometer chip making processes. (So here is another unintended impact of that 10 nanometer delay; a cascading failure, if you will. . . . )
Imagine if Intel (and Micron) had taken the same attitude with 3D XPoint memory back in 2015 that Intel has taken with the CXL protocol in 2019. Everyone who had competing protocols – IBM with CAPI and OpenCAPI, Xilinx and AMD and the Arm collective with CCIX, Nvidia with NVLink, and Hewlett Packard Enterprise and Dell with Gen-Z – have all gotten behind CXL because Intel let them participate in what will be a volume market and have some input.
There is no technical or economic reason why this 3D XPoint variant of PCM memory – and specifically the DDR DIMM form factors that are so good at extending main memory with very little or no impact to performance – should not have been pervasive across the entire server industry, driving up volumes and driving down prices in such a way that Intel and Micron would not be playing a game of hot potato with it. In this final move, it is Intel who is solely going to be holding the hot potato, and it could have been working with Micron, and maybe even Samsung, to make French fries for the entire world.
That’s the theory. In practice, adding another layer into the memory hierarchy is difficult, and Intel did not do a very good job of making 3D XPoint transparent to applications and operating systems – any more than it made near MCDRAM and far DDR4 main memory blocks on the “Knights Landing” Xeon Phi parallel processors easy to address. And we need to look no further than MemVerge, which we profiled when it dropped out of stealth in April 2019 as those Cascade Lake servers started shipping and which we talked about recently as the Era of Big Memory is upon us, to see that it is indeed possible to make a mix of DRAM and Optane memory easy to use thanks to the “memory hypervisor” that MemVerge has created. We never had much use for the Optane SSDs, but if the volumes had risen by a factor of 10X or 100X, then maybe they would have been more interesting for system designers.
As if this wasn’t enough, Micron, after talking to users and system builders, came to a realization that readers of The Next Platform recognize full well: That in many cases, memory bandwidth is more of a limiting factor for applications than is memory capacity. While Intel and Micron were working together on 3D stacked memory with through silicon vias (TSVs) and used this in the Knights Landing CPUs, Samsung won that bandwidth battle with the game and compute card businesses of Nvidia and AMD and their use of HBM (and now HBM2 and HBM2E and someday HBM3) stacked memory. And now, Micron is going to try to figure out how to use CXL as a means of linking external memory and flash to servers over the PCI-Express bus and sell devices that exploit this capability instead of trying to sell 3D XPoint in DIMMs (which in November 2020 Micron said were still on the roadmap) and trying to hang off CXL in 3D XPoint SSD form factors, which it was already selling as the X100 drive.
This memory bandwidth issue is real, and we are not exactly sure how using memory devices (whether or not they are using main memory semantics, or are bit addressable instead of byte addressable in the parlance) over the CXL bus helps all that much. The PCI-Express buses of most systems are being hammered, too, and we will say once again that IBM’s idea of having all I/O on a chip – be it memory access, NUMA interconnect, or peripheral interconnect – use the same high speed signaling SerDes is making more and more sense in the world that Micron and IBM both see. IBM’s CPU designers get this, and are implementing it in Power10 with DDR5 buffered memory, and we don’t see future processors getting on board with this approach. Yet.
Sumit Sadana, executive vice president and chief business officer in charge of the four divisions of Micron, summed the situation up thus: “The value proposition of 3D XPoint was to operate as persistent memory at a lower cost to DRAM or as storage that is significantly faster than NAND. In the years since 3D XPoint was first announced, datacenter workloads and customer requirements have continued to evolve. As data intensive workloads proliferate and AI ramps in data centric applications, the CPU-DRAM bandwidth has become an increasingly limiting factor of overall system performance. In addition, as CPU architectures evolve to dramatically increase CPU core count, more DRAM is needed to ensure adequate memory bandwidth per CPU core. This trend has driven ever-increasing server DRAM content.”
To be fair, the way Intel and MemVerge are using Optane, they are increasing the memory capacity but not the bandwidth because the memory controllers and memory slots are constant in the system. If you want more memory bandwidth, you have to use a different kind of memory, like NEC does with its Aurora vector accelerators or Fujitsu does with its A64FX reasonably vectored, HPC-style CPU. In those cases, they use the same HBM2 memory as high end graphics cards and GPU compute accelerators do. If memory bandwidth is going to increase in the CPU complex, the CPU makers have to do that, and Micron has to take what it gets unless it wants to dust off MCDRAM or become a player in HBM2 memory. Micron has very little control over memory bandwidth beyond driving down watts and driving up memory clocks. And CXL won’t change that, although it will create a second class of slower memory that is external to the CPU complex that rides on the PCI-Express bus instead of the memory bus. Like, for instance, a byte addressable Optane SSD could be and sometimes is through some clever hacks.
The other issue with 3D XPoint is that NAND flash just keeps getting cheaper and cheaper, and the rate of change for Optane SSD costs is not keeping up. As far as Sadana is concerned, persistent memory “was always the strategic long term market opportunity for 3D XPoint” and “3D XPoint-based SSD products are not expected to be anything more than a niche market over time.”
Micron fully admits the difficulty of changing the programming model in systems to add that new memory layer into the hierarchy, and says it will work on memory products that use CXL and also present less of an adoption barrier. And while 3D XPoint might be the right answer over the long term, the use of these CXL memory devices, with what we see as their own limitations and benefits, is going to hamper the adoption of 3D XPoint DIMMs, which Micron is not even making yet and may have not been able to make for a server generation or two yet for all we know.
Micron says it is stopping 3D XPoint development immediately and will cease making 3D XPoint chips after it completes its contractual commitments in the next several quarters, and adds it is engaged in talks to sell the Lehi fab, which it hopes to do before 2021 ends.
Your move, Intel.