Intel doesn’t want to just create a rival to the CUDA programming model and library stack so it can better compete against Nvidia in the GPU compute market. With oneAPI, it wants to create an open ecosystem that includes a programming framework, called oneAPI and largely based on the SYCL framework that is maintained by the Kronos Group, a data parallel C++ compiler, and a diverse set of high performance libraries to accelerate AI and HPC applications that can compete with the hundreds of libraries that Nvidia and its partners have created for CUDA.
To do that, Intel is going to need some help, and to that end the company is snapping up the 80-strong team at Codeplay, one of the stewards of the SYCL programming model that was created in 2014, that is at the heart of Intel’s oneAPI cross-platform, cross-device programming effort, and that is a derivative of (or better still an integral of) the OpenCL programming framework that was created by Apple in 2009. Both SYCL and OpenCL are steered by Khronos Group.
Financial terms of the deal were not disclosed, but what we can say is that Codeplay is a lot more valuable to Intel today than it was a few years ago when the oneAPI effort was first launched. Intel is further down the oneAPI road and closer to releasing its “Ponte Vecchio” Xe HPC GPU compute engine, and it needs a software story to tell for its GPUs as well as for its CPUs, FPGAs, and custom ASICs. Particularly with AMD is getting all of the attention in the HPC space with its exascale-class supercomputer wins with its “Aldebaran” Instinct MI250X GPU motors and its ROCm 5.0 development environment. The ROCm programming environment is open source and includes HIP converter that can create GPU code that runs on Nvidia GPUs in addition to the native mode running on AMD GPUs.
Intel wants to be the most open of the platform suppliers, and that is because it has to be. Nvidia, as the undisputed GPU compute leader (excepting a few big supercomputers in the United States and Europe), can build a moat around the CUDA platform and its libraries and just continue to make money by giving this software away for “free.” (Nothing is free, particularly when 75 percent of Nvidia’s employees are writing software. The cost of the software is embedded in the hardware – no question about it.) Intel wants the oneAPI stack to not only be free, but open, and AMD wants the same thing for ROCm, because this will spur adoption of its software and de-risk the choice of Intel hardware to develop applications. Code developed in DPC++ atop SYCL and accessing the oneAPI libraries can run on GPUs from Intel, AMD, or Nvidia.
Codeplay is one of the organizations that has been proving that you can balance the three Ps of programming for high performance systems – that would be productivity, performance, and portability – in such a way that you can achieve portability and still get performance and have reasonable productivity. To prove this point, the team at Codeplay created oneAPI SYCL compilers for AMD and Nvidia GPUs by three major US Department of Energy facilities – namely, Lawrence Berkeley National Laboratory, Argonne National Laboratory, and Oak Ridge National Laboratory. Codeplay has also written its own SYCL DNN neural network and SYCL BLAS linear algebra acceleration libraries that can run on AMD, Intel, and Nvidia GPUs and has been involved in making the cuDNN and cuBLAS libraries that Nvidia has created for the very heart of CUDA run within the oneAPI environment.
“We do a lot work on performance and portability, not just portability,” Andrew Richards, co-founder and chief executive officer at Codeplay, tells The Next Platform. “SYCL enables portability, but the team at Codeplay proves that you can actually build performance portable libraries on top. And out SYCL DNN and SYCL BLAS libraries achieve really competitive performance despite being able to run on Nvidia, AMD, and Intel GPUs as well as various other bits of hardware.”
We have often wondered why there are not open source (or at least open) libraries that have the smartest minds on Earth creating them, and then having some of the other smartest minds on Earth tweaking them to provide tuned performance on specific bits of hardware given their architecture expertise. Call it a write once, tune many times approach. This would stand in stark contrast to having compilers and libraries highly tuned by their hardware vendors or keenly interested participants – Cray for CPU compilers that span architectures or Nvidia for GPU compilers and libraries are but two examples.
“What Codeplay does as a company is to show people how to do that,” says Richards. “We show people how they can write once and run everywhere. But we don’t actually write the whole library because we are more of a complier company. But I think with this Intel deal, we will be able to be more ambitious in what we do, and be able to do stuff at larger scale. But I would add that a lot of these libraries are written by domain experts, and that is why we, as compiler domain experts, can show people how to write these libraries and achieve that high level of performance and portability.”
It is absolutely clear why Intel wants to buy Codeplay, but it is not clear why Codeplay did not want to remain a kind of Switzerland. Clearly, money is involved here, and Codeplay was no doubt rewarded for its work and taken care of by Intel.
But, nonetheless, it is very tough to be Switzerland. As an example, let’s consider IBM’s “Bluelink” OpenCAPI accelerator interface. It had all of the right technical details, but what Intel wanted was to drive the CXL standard, and as the dominant supplier of CPUs in the world, not only did Intel win, but it got the companies behind Gen-Z, CCIX, Infinity Fabric, and OpenCAPI to all follow suit and pay their respects to CXL. And now, at the very least, we gave a single standard that has emerged for how we can link accelerators and soon memory to compute engines, converging PCI-Express and DDR memory controllers down to one protocol in the not-too-distant future.
There is no question that Intel needs Codeplay to increase the odds of oneAPI being adopted outside of its own compute engines, but it is also true that Codeplay needed the might of Intel to expand its operations and make SYCL and DPC++ more pervasive than it could be alone.
Editor’s Note: You wait a long time for a title like that.