Arm Says Neoverse Is A More Universal Compute Substrate Than X86

Back at the end of July, we had a discussion with CPU maker AMD and the topic of conversation was hybrid cloud. In the course of that conversation, one of the top server brass at the X86 server chip maker took a few swipes at Arm, contending that AMD Epyc CPUs were a better substrate to span the clouds and datacenters of the world than was a collection of Arm processors with Neoverse CPU designs for the datacenter.

Arm understandably wants to have a chance to address some of the issues that AMD raised. So we did a Q&A interview with Mohamed Awad, senior vice president and general manager of the company’s infrastructure business, to give Arm that chance.

Timothy Prickett Morgan: First of all, you had me dead to rights in not making it clear in the title and in a few references in the small amount of text that wrapped around the AMD video interview that the opinions expressed in that story were AMD’s and not necessarily my own, which I updated immediately.

Mohamed Awad: To be fair, I don’t think everyone quite understands it, so I thought it would be good to actually just clarify some of these issues.

Let me jump into it because I do think it’s important to set the record straight, because there is subtlety in what was said, which I think is important.

So if you step back and think about the CPU world of five or six years ago, when Arm didn’t exist in a meaningful way within the datacenter, what you had was mostly X86 on servers, which was primarily Intel X86 and which was broadly established. And then you had an emerging AMD, and to their credit they have done an incredible job.

But one of the headwinds that they had to take on was this idea that they shared an architecture with Intel, but did not share a microarchitecture. And so what does that mean? Specifically, that means that they reference the same document to go build their CPUs, but the actual CPU implementation – the core – is designed at a microarchitectural level by the individual companies. So AMD Epyc had their architecture and Intel Xeon had their architecture, and then they built CPUs around that. So customers have to worry about whether they are optimizing for the AMD implementation or for the Intel implementation.

Arm Neoverse wasn’t launched until 2019 – we talked about it in 2018 but wasn’t widely available until a year later. Prior to Neoverse, all of the Arm implementations that were out there -and there were a bunch of them – tried and didn’t quite make it.

There was a good reason for this. Importantly, before Neoverse, we handed people an architecture specification and they built a CPU implementation for the datacenter, and tried to get the world to go off and adopt that CPU implementation and optimize for that CPU implementation. But each one was very different. The same way as within the X86 ecosystem, where the two implementations were very different.

What happened when we launched Neoverse is that the hyperscalers and other top consumers of datacenter compute started to coalesce around Neoverse. So today AWS, Microsoft, Google, Nvidia, and others have adopted Neoverse as their CPU core, their microarchitecture. And what that means is we actually build the CPU implementation, the core implementation. We hand them that IP, and then they build their SOCs around that. So there is a level of built-in optimization and compatibility across all those implementations.

TPM: Is there any cloud builder or hypervisor to date that doesn’t use a Neoverse core and that uses a customized core in any way? To my knowledge, there isn’t, although there might be something happening with Nvidia’s future “Vera” server CPU. I strongly suspect that the Nvidia slide that said “custom Arm core” meant to say Compute Subsystem.

Mohamed Awad: There is no hyperscaler or cloud builder today that is building an SoC based on a custom architecture. Their SoCs are all built on Neoverse.

TPM: Ampere Computing was the only company that was even trying to do a custom Armv9 core and that has been pretty quiet since SoftBank announced its pending acquisition of it.

Mohamed Awad: Ampere did go after a different performance point and did a custom microarchitecture, and they have gotten some traction with folks like Oracle, to their credit. But they are the only one.

TPM: I just want to level set and be precise: Is there someone doing custom Arm cores in China? The hyperscalers and cloud builders there are adopting Arm and doing SoCs and some of them bought Ampere chips, too.

Mohamed Awad: The Chinese hyperscalers and cloud builders use Neoverse cores, too. There may be a few exceptions in China outside of the datacenter where people are making custom Arm cores, but from a hyperscaler and cloud perspective, there are no exceptions. They all use Neoverse designs.

And that is the huge distinction that I think was lost in your previous piece because the way that it was portrayed was that there were different microarchitectures and completely different implementations across the board. And that’s actually not the case.

TPM: Well, I can see what you are saying in hindsight, but the way I heard what AMD was saying was that there are distinctions between, say, Neoverse N1 cores and V2 cores compared to Zen and Zen C cores, where all they did was change the cache sizes on the cores and the layouts. But I see the distinction you are making, and I think this is fair.

Mohamed Awad: The other thing that I wanted to add is that the world has moved forward from 2019. And if you look at what we’ve done since then, we’ve even moved further towards Compute Subsystems, or CSS. This actually creates a further level of consistency across the Arm server CPU designs because not only are many hyperscalers and clouds using the same CPU cores, they are now using the same interconnect IP between those cores and other elements of the SoC. They are putting the elements of the CSS stack together in different ways without a whole lot of differentiation associated with that in terms of technologies. And now they have a level of consistency across the chip, not just in the core.

Now, these hyperscalers and clouds still differentiate. To be fair, all these guys still differentiate in terms of what I/O they choose, how many cores they put down, whether or not they want to add some acceleration. So there are still areas for them to differentiate. I don’t mean to say that it’s all homogenous across them.

But you know, one of the huge success stories in the Neoverse ecosystem has really been around software. I remember talking to you six years ago at an event in Santa Clara, walking out, and I was getting into my car, and you asked: When is this Arm thing going to take off?

TPM: We had been waiting for Arm server chips to take off in the datacenter for almost a decade at that point. . . .

Mohamed Awad: One of the reasons why it has taken off recently is because those hyperscalers and cloud builders have stepped up, and it’s no longer Arm carrying that software ecosystem on its back. All of these folks who are building software now for their solutions and moving the ecosystem over are doing a lot of the work. So this flywheel effect has happened. And they are all competing vigorously for those customers who are buying compute on the cloud. But the software that they are building is actually leveraged across each of them. For instance, Google openly talks about how there is an easy lift from one cloud to the other, and AWS does the same thing. So saying this is hard to move between Neoverse chips is not true.

TPM: All right, that’s fair.

Mohamed Awad: I think the other thing that’s important to point out is that for general purpose compute, and with AI, there is a whole different dynamic in the datacenter, which I personally find fascinating, and I am interested in your take on it.

If you look at all of the major AI technology customers or developers out there –Nvidia, AWS, Google, Microsoft, and others – they all have to deploy Nvidia hardware because Nvidia has done such a fantastic job building what they are calling AI factories and these full rackscale solutions. And those rackscale solutions consist of general purpose CPUs based on Arm and accelerators. The entire rack is built together. So they’re all deploying Nvidia. But then they are also all working on their own fully integrated rackscale solutions to get the most efficiency and performance as well whether it is Google with its Axion and TPU, or Amazon Web Services with its Graviton and Trainium, or Microsoft with its Cobalt and Maia. And the reason why I bring that up? They are doing that because of the efficiency and performance gains of building those completely vertically integrated solutions.

TPM: Well if you are a cloud, you have to sell X86 server capacity and you have to sell Nvidia GPUs because that is what customers have their own applications and underlying software stacks running on. And as I am fond of pointing out, anyone deploying on X86 or Nvidia GPUs will be charged a premium, which the cloud builders are happy to charge for. But they will also create homegrown stuff, first for their own hyperscale applications that they either provide for free to use with advertising support or that they sell at a discount compared to X86 CPUs and Nvidia GPUs on their cloud businesses. The clouds have to do both, and the market will figure out what the balance is between X86 and Arm and Nvidia GPUs and other XPUs in the long run. But if this were all just SaaS, we wouldn’t talk about the hardware, and we wouldn’t know.

Mohamed Awad: I agree with that, but, I think the point that I’m trying to make is that they all have to deploy Nvidia, and that means there is a bunch of software resource going to that. Whatever software resources are not going to that are going to their internal developments, which are also Arm-based. So the point that I’m trying to make is the entire control plane associated with AI, whether you’re talking networking, general purpose compute or otherwise, is all being optimized around Arm today.

TPM: Let me ask a tangential question, since it relates to the commercialization of rackscale systems, which we are finally seeing with systems like the GB200 NVL72 and GB300 NVL72 from Nvidia. What’s your plan for a CSS rackscale design? Is that something that you will do eventually, or will you leave that level of integration be up to the OEMs and ODMs and companies like AMD and Nvidia?

Mohamed Awad: We’re exploring beyond our current platform into additional compute subsystems, chiplets, and even full-end solutions. That is something that we do look at, as we have said before.

TPM: But it is the next logical thing to do as you move out from core to system to the rack. . . .

Mohamed Awad: The way that I think about this: It is our objective to accelerate the adoption of Arm across the board. So there are going to be places where more integrated solutions make sense, and you have seen us embrace that as we move from architecture to IP to compute subsystems. We then created Arm Total Design to make chiplets broadly available. So the idea that we would continue to find areas where there is unnecessary friction to adopting Arm and address those, I think that’s natural for us to go off and do that.

TPM: The lack of NVLink ports has been a problem, but now we have NVLink Fusion for those who can afford it or want it to add to a custom Arm processor to their own AI systems. So hyperscalers and cloud builders can do the same things with their Arm CPUs that Nvidia does with its current “Grace” and future “Vera” Arm CPUs.

Mohamed Awad: There are lots of ways to enable such interconnects. Nvidia is an incredibly important partner of ours and we are supportive of what they’re doing around NVLink. And otherwise, we are also respectful and supportive of where the hyperscalers and cloud builders are going, and we are just looking to ensure that we can enable them as they build out. Because at the end of the day those four or five companies are the largest consumers of compute, and so that’s what we are focused on.

I would add that there is a next tier of customers and partners who are looking to adopt Arm in various ways because they see some of the advantages that the large hyperscalers and cloud builders who have built their own silicon are enjoying, whether that’s performance per watt efficiency or overall TCO or what have you. And so we are exploring how we can support them.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

2 Comments

  1. I’m wondering about the “you have to sell X86 server capacity” comment. For almost all applications these days, x86 vs. Arm vs. whatever should be irrelevant – especially if the business applications are written in Java / Javascript / Python / etc. I’ve had no issues taking C++ Python extensions that I wrote on a Mac (Arm CPU) and recompiled to run in the cloud on x86, zero changes needed. That said, I understand it would be a problem if you’re using 3rd-party libraries where the vendor hasn’t built an Arm version.

  2. My understanding is a huge amount of architecture-specific optimisation went in to creating well-performing Java and Javascript environments. I guess it’s already been done for ARM and likely similar in quality to the x86 runtime environments. Weirdly, the Raspberry Pi is responsible for a lot of less visible development tools working well on ARM.

    My understanding of universal compute substrate is where the hardware and software environments available on premise are nearly the same as in the cloud. This is what IBM named hybrid cloud and at the moment even Power fills this role better than ARM.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.