This week Intel unveiled Compute Express Link (CXL), the chipmaker’s own cache coherent accelerator interconnect that it is grooming to become the industry standard. To do that however, it has to convince the backers of CCIX and OpenCAPI to jump ship. To further that goal, Intel has simultaneously booted up the CXL consortium, a group that contains of some of the heaviest hitters in the industry, in particular, Google, Facebook, Microsoft, Hewlett Packard Enterprise, Dell EMC, Alibaba, Cisco, and Huawei.
The goal of establishing a cache coherent interconnect standard is certainly laudable. Computational accelerators, like GPUs and FPGAs, and purpose-built machine learning chips like TPUs are becoming more commonplace in server gear, especially in datacenters doing HPC and AI. To that mix, you can add memory expansion devices which can act as storage class memory inside a server. With all these gadgets hanging off PCIe ports, vendors have been looking for ways make the connection between the host CPU and these devices faster and more transparent to the application.
Cache coherency is key to ease of use, since it allows code running on these devices to directly access host and local memory in a shared address space. That’s the critical feature missing from PCI-Express, the interconnect of choice for most accelerators and embedded memory devices these days. Cache coherency simplifies programming considerably and relieves the application from copying data back and forth between host and coprocessor memories depending on which chip needs access to it at the moment.
At the same time, system makers and customers want standardization in order to avoid vendor lock-in. Devising a standard technology means everyone can take advantage of the kind of interoperability that people take for granted in technologies like PCIe and Ethernet. It also spurs innovation, since a stable design point encourages increasingly more creative implementations.
All of this reduces system complexity – both hardware and software – and can therefore lower cost, which of course is the bottom line for customers. That’s true whether you’re a hyperscale giant, a supercomputing center, or a mid-sized enterprise.
But the road to standardization is not always paved with good intentions. From a chip vendor’s perspective, interoperability is not all that attractive if it allows your competitor to easily replace your product in the next upgrade cycle. And certain technical features that favor your roadmap can be preferable to the features in a standard you have no control over.
At this point CXL appears to be functionally quite similar to the Cache Coherence Interconnect for Accelerators (CCIX), a technology we first reported on in 2016. Since then, the CCIX consortium has gathered dozens of members and is currently backed by multiple chipmakers, including AMD, Arm, Xilinx, Marvell/Cavium, and Ampere, as well as OEMs such as IBM, Lenovo, Fujitsu, and Cray. Network vendors Mellanox (soon to be part of Nvidia) and Broadcom have also signed up. Huawei is a member of both consortiums.
Intel and the CXL consortium have not released much in the way of details of what constitutes the technology, especially since the ink on the specification is barely dry yet. We’ll be speaking to these folks in the not-too-distant future to get a better sense of how they intend to position CXL in the market and when we should start to see the first products.
In the meantime, here’s what we do know: Perhaps the most significant feature of CXL is that it’s designed to run atop PCIe 5.0, the next-generation PCI-Express standard promising 32 GT/s per lane (twice as fast as PCIe 4.0). The technology will also likely be supported by future Intel GPUs, since both FPGAs and GPU were mentioned as accelerator targets in the company’s CXL press release. As we previously reported, Intel appears to be on track to introduce these devices as discrete coprocessors in 2020.
CCIX is also is designed to use PCIe as the physical transport layer, but that current specification is aimed at Gen 4 only. (Despite this apparent limitation CCIX technology was able to demonstrate extended speeds of 25 Gbps between two Xilinx FPGAs.) In both cases, using the PCIe physical infrastructure, while creating some limitations on speed, will probably help to propel these technologies into the market.
A third standard devised by IBM, known as OpenCAPI, has the added feature of supporting virtual addressing (not just virtual address translation). Significantly, it has garnered support from Nvidia (which has also implemented its own proprietary accelerator interconnect in NVLink). OpenCAPI is also backed by some of the same members of the CXL and CCIX consortiums, including AMD, Xilinx, and Mellanox. HPE and Dell EMC are also OpenCAPI members at something called the “observer level.”
Given that heterogeneous computing is not only here to stay but will most likely expand with the emergence of future purpose-built accelerators for AI and machine learning, it would behoove the industry to whittle these three interconnect standards down to a single one. (Imagine if there were three PCIe or Ethernet standards.) That said, it’s conceivable that all three will move forward in parallel, at least for a while, as the market works its Darwinian magic.
Until that whittling occurs, it’s going to be a contentious and confusing time for vendors and customers alike. Companies aligned with Intel and CXL in many cases also need to partner with Nvidia/Mellanox, Xilinx and Arm on the CCIX side. Ten years ago, it might have been the OEMs and chipmakers determining which technologies to adopt, but these days, it’s the hyperscale companies that are driving these kinds of standards. Which is another way of saying the old rules don’t apply anymore.