Intel has a lot at stake with its oneAPI software stack. A cross-platform parallel programming model, oneAPI is designed to fit into a highly diverse, heterogeneous – and increasingly cloud-based – infrastructure environment by enabling developers to stretch a single code base over multiple and varied architectures.
In a world where the number of chip platforms is rapidly expanding and accelerators from GPUs to FPGAs to DPUs are becoming the norm, being able to use the same tools when programming for the myriad chip architectures has a utopian vibe for many developers.
It’s one of the reasons James Reinders returned to Intel just over two years ago after spending more than 27 years at the chip maker before leaving in 2016. It was a chance to help create a technology that could bring benefits to the IT industry, from enterprises out to HPC organizations.
“I spend a lot of time talking about how heterogeneous computing is changing the landscape because the devices we compute on don’t come from one vendor,” Reinders tells The Next Platform. “One of the most common configurations these days is a Xeon with a Nvidia GPU when you’re trying to do a lot of compute. The reality is, you’ve got tools from Intel to help you with the CPU and then you’ve got tools from Nvidia that try to just use the GPU and it gets more complex as time goes on. You’ve got AMD with their GPUs, Intel FPGAs and lots of startups doing things. The problem is, developers are faced with a lot of different tool chains and the objectives of tool chains traditionally are only to support the vendor who’s producing the tool chain. The more supported they are, the more proprietary they are, the more true that is.”
LLVM and GNU are successful in the open-source world because people have no love of managing different tool chains and oneAPI is designed to address that. OneAPI is seeing some momentum among early adopters – as of a year ago, more than 100 national laboratories, research organizations, educational institutions, and enterprises were using the platform, with Intel pulling in community input and contributions to the oneAPI spec through the open project. There also are now 30 oneAPI Centers of Excellence around the world.
The release of the oneAPI 2022 toolkits brought with it more than 900 new and enhanced features, including a unified compiler implementing C++, Fortran, and SYCL, the royalty-free, cross-architecture programming abstraction layer and data parallel C++ compiler that underpins oneAPI.
Now Intel is releasing the oneAPI 2023 toolkits, which includes many new improvements. Most significantly, the toolkits will include a plug-in model, developed with Codeplay – a company that has helped shepherd SYCL since its release in 2014 and that Intel bought in June after a few years of partnering with it – that supports Intel products but is open so that developers can more easily use it for other accelerator architectures.
The plug-ins enable developers to write SYCL code into high-performance software to run on non-Intel GPUs and other chips.
“What will happen when these tools come out is you can download the tools from Intel, but then Codeplay will have a download that … plugs in and adds support for Nvidia GPUs and can plug in and support AMD GPUs,” Reinders says. “To the user, once those are all installed, you just run the compiler and it’ll take advantage of all of them and it can produce a binary – this is what really distinguishes it –that when you run it, if it turns out you have a system with, say, AXV-512 on your CPU, maybe an integrated graphics from Intel, a plug-in graphics from Nvidia, plug-in graphics from AMD, a plug-in from Intel, your program can come up and use all five of them in one run.”
The plug-ins also come with support that organizations can buy: Intel will support its GPUs and Codeplay will sell support for Nvidia and AMD.
Intel has been working for a long time on delivering the capabilities available via the plug-in modules and Codeplay is playing the key role in marshaling it forward, according to Reinders. Intel wants to have a development framework similar to what Nvidia offers with CUDA and AMD with ROCm, which includes a range of programming models, including Heterogeneous Interface for Portability (HIP), OpenMP, and OpenCL. Intel includes support for Fortran, C, C++, MPI and OpenMP, Python, and SYCL, but that supports more than just Intel devices.
The developments with oneAPI – and now the introduction of the plug-ins – come as Intel works to get its legs back under it following several years of missed deadlines and other challenges. The company is preparing for the release early next year of its delayed “Sapphire Rapids” Xeon Scalable processors and is building out its FPGA portfolio. At the same time, Intel wants to push in the GPU accelerator business to compete with Nvidia and AMD with its upcoming datacenter GPUs, including “Ponte Vecchio” high-density Max Series chips and Flex Series Xe GPUs. (oneAPI 2023 will support these upcoming chips.)
Reinders admits that Intel is running behind Nvidia and AMD in the GPU space, but added that oneAPI can become an equalizer of sorts. That will be important going forward; citing a June report from Evans Data, Intel noted that 48 percent of developers want to use heterogeneous systems that include more than one kind of processor.
“People recognize that we’ll do a great job taking advantage of Intel and the latest and greatest with Sapphire Rapids and Ponte Vecchio and High Bandwidth Memory, but this interesting twist now is opening it up so that those tools can support any device that might be in the system and we’re willing to have anything plug in,” he says. “It could be stuff from open source, like the SYCL compiler modules, and the PTX back end that addresses Nvidia and the AMD back end are in open source. They weren’t created by us. Codeplay plugs them in and we will support them. Whatever makes any piece of hardware run the best, you really have to be honest at engineering something that can get access to that so the developers aren’t left stranded and that’s what we’re after here.”
Getting the plug-ins into the hands of developers is an important first step. Now the challenge is to mature the model, he says. It works well with compilers and a few libraries. Now Intel engineers want to make the libraries and frameworks more modular to make it easier to support different devices and to run accelerators in a way that delivers performance without extra call overheads. They also understand that as the model matures, innovation in accelerators will continue, not only among Intel, Nvidia, and AMD, but also through other architectures like Arm and RISC-V as well as the myriad AI chips being developed.
There also are other challenges. Finding APIs that can hold up to spanning devices, such as GPUs and FPGAs, is one, Reinders says. If an API is complicated enough, it can span everything, but then it may not be as useful as most need. The reverse is you may end up with an API that is easier to use but more limited. Getting the APIs is going to take work.
Another is building library support. Nvidia has a massive number of libraries within CUDA to support its GPUs. With oneAPI and the plug-in model, Intel and Codeplay need to build out libraries that touch on multiple architectures.
“There’s a lot of effort being put into libraries because, to some extent, the compiler stuff is a little easier,” he says. “The libraries are very diverse and very focused on different areas. You’ll see Nvidia has grown an enormous number of libraries around their ecosystem. Obviously, Intel with x86 had a huge number of libraries and when new players show up, even AMD with GPUs, libraries has been a big challenge. They’re trying to catch up because developers view the libraries as a key part of the ecosystem. Getting the APIs right and getting the library support implemented are two of the obvious heaviest lifting.”
At the Intel Innovation 2022 show in September, the company showed that with open source, oneAPI could get competitive performances by converting a CUDA program to SYCL and running it on an Nvidia device or converting an AMD HIP program and running it on an AMD chip. The level of performance varied by case, but most of the it was within 5 percent to 20 percent, Reinders says. That said, what Intel didn’t emphasize to those watching the demonstrations was to get these results, “you had to know what you’re doing and go build the open source, install this SDK and do this and this,” he says.
Bringing the plug-in model into products from Intel and Codeplay is important because it makes much of the work turnkey and delivers support.
“That’s absolutely critical for large-scale use, developers being able to take advantage of heterogeneous hardware,” Reinders says. “We can all spend our time if we want learning these things, building from open source, pulling all the different sticks together, running a bunch of things. But if someone else solves it for us, it’s a lot more likely to take off. It’s a model others can plug into that also gives them the same advantage. That’s why it’s a big step, and it sets a new model in place that will catch fire over time. The idea that proprietary tools are prebuilt tools that are supported should be open to having other architecture things plugged into them.”
Organizations can get the 2023 release of oneAPI in Intel’s Developer Cloud, and the company also is beginning to roll it our through its regular distribution channels.
Given the examples of MKL the Intel Math Kernel Library intentionally running slower on AMD processors, it does not seem to me OneAPI is likely to receive much trust as a cross platform API.
Given how well established the CUDA system is from NVidia, AMD’s HIP in ROCm also bills itself as cross platform. As far as I can tell the main reason less-established GPU manufacturers say cross platform is to get market share. After that there is simply too much conflict of interest.
In my opinion, a software API provided by a non-hardware focused company such as Julia might have a better chance of prioritizing cross platform compatibility above competing interests.
Quite right! Unless oneAPI is an open source, multi-contributor, project, it is likely to work best on Sapphire/Meteor+Ponte-Vecchio, and not so well on EPYC+MI300, or Grace+Hopper. Julia is a nice PL project but I find the syntax of the MATLAB/GNU-Octave language easier to read (a nice development of Cleve Moler, cited in Jack Dongarra’s Turing Lecture). The back-end libraries (esp. UMFPACK) do the hardware-specific heavy-lifting if I understand correctly.
Let’s see how long the oneAPI Toolkits will be free of charge, as they are now (https://www.intel.com/content/www/us/en/developer/articles/news/free-intel-software-developer-tools.html).
…btw: Have to admit, that I totally missed the point when Reinders came back…thought he left and perhaps was that disappointed of the XeonPhi/ManyCore project being stopped, that he would never come back…