RISC-V In The Datacenter Is No Risky Proposition

It was only a matter of time, perhaps, but the skyrocketing costs of designing chips is colliding with the ever-increasing need for performance, price/performance, and performance per watt. Something has got to give, and that thing might be the architectures currently used in the datacenter.

What the world needs, perhaps, is something akin to the Linux operating system when it comes to chip instruction sets – something that is shared and extended by the community but which nonetheless has governance that is informed both by technology and the economic realities of the tech sector.

For RISC-V International, the non-profit nosiness association that controls the ISA and other intellectual property related to the architecture, that job falls to Calista Redmond, who among other things ran the OpenPower Foundation that opened up IBM’s Power chip architecture for a number of years.

There is quite a bit of activity going on in the RISC-V space, of course, and we are keeping an eye out on it with regards to servers, storage, and networking in the datacenter. (Our contention as we expressed last fall, is that Arm is the new RISC/Unix, and RISC-V is the new Arm.) We had a chat with Redmond about what is going on and what the prospects are for RISC-V to attack the datacenter – and to do so much more quickly than the X86 or Arm architectures did.

Timothy Prickett Morgan: As you well know, The Next Platform does not care one bit about RISC-V chips in embedded devices or smartphones or whatever – except inasmuch as it helps drive the architecture forward and, specifically, drive it into the datacenter in CPUs, DPUs, or various kinds of accelerators. It has been nearly three years since I asked you this question, so it is probably a good time to get an update: Where are we at with RISC-V in the datacenter?

Calista Redmond: As you well know, hardware takes time, right? But obviously there is a lot of interest among larger multinationals, and others have been using RISC-V in microcontrollers but now are starting to contemplate – now that they have a toe in the water – what their future generations of technology look like. I think chiplets and other ways of composing an SoC are changing the game quite a bit. And I think that you’ll see a lot more datacenter-scale operators along with HPC centers starting to more seriously consider RISC-V. And there are a few early adopters. You see a lot of them as smaller organizations who are pivoting strategy, or leveraging RISC-V as their main investment. MIPS is as an example of this.

Increasingly, companies are realizing that if their future, their destiny, is dependent on somebody else, that’s a higher risk in their business model and in their strategic model. And so they are more seriously contemplating going to a custom chip – and many are looking at that, taking that more in house or leveraging IP houses or others around them – that they’re going to start creating their own designs. No one does custom chips lightly because that is in and of itself a tangle of questions and strategic variables, but they’re trying to meet not just price and performance, but other types of capabilities that differentiate them, such as power consumption or any number of things.

TPM: I can’t tell you the number of times I have heard about a custom Arm server chip design that was killed years earlier, and the big complaint was that they wanted to add certain things to the Arm ISA – things that the startup’s key early adopter customers really wanted – but could not do so because it would break compatibility in some fashion. With a new architecture, sometimes you need to bend the compatibility a bit.

Calista Redmond: Or worse yet, you’ve got a couple of years of lawyers working on contracts and millions of dollars spent on that. . . .

There are two things at work here, and this is where RISC-V helps. Chip makers want absolute control over their designs. But no one wants to start with a blank sheet of paper. Everyone wants to have that running start and to have a running start, you have to have base building blocks, which is obviously what RISC-V International works on. We get our base ISA and there are ratified the extensions, and then you have your menu of extensions, you can pick and choose what is going to be most effective – price, performance, power, whatever your variable is – for the implementation you’re trying to do. And it’s only when you get to hyperscale or enterprise-class OEMs that volumes start to become meaningful. We are not at volumes that are meaningful just yet, but what you are going to be seeing this year is a lot more production ready, server-class chips coming to market.

TPM: I assume these are coming from the usual suspects. Let’s rattle them off.

We have recently written about Ventana recently, who come from the Applied Micro Arm server chip business from more than a decade ago. We know about SiFive, obviously. We know about the European Processor Initiative to get a custom RISC-V accelerator out the door and the related effort by SiPearl to get a family of Arm CPUs out the door beside it. HiSilicon, the chip development arm of Huawei Technologies, has created RISC-V controllers and server CPUs can’t be far behind. Not with Alibaba and Tencent joining a slew of other companies backing the RISC-V architecture last December, which was a year after the China RISC-V Alliance, spearheaded by the Chinese Academy of Sciences, first talked about the XiangShan processor. Alibaba has the XuanTie RISC-V for handhelds and embedded stuff, and that can be grown into a server chip. Tencent might be big enough to do a custom variant of XiangShan, or work with HiSilicon to make one. I’m just making that up. Baidu has just backed Chinese startup StarFive, which has its U Series and Dubhe Series processors aimed at various datacenter workloads. I would not be surprised if the HPC centers in the United States and Europe are taking a hard look at RISC-V – their job is to be on the cutting edge and test put new ideas.

I still need to chase down Esperanto Technologies to find out if it will do more than inference with its 1,088 ET-Minion and four ET-Maxium cores. . . .

Calista Redmond: Those are all of the right names.

There is obviously a lot of activity going on in China. And I will add China has been one of the strongest contributors in the RISC-V ecosystem. This is not about consume only and don’t contribute. The RISC-V members all realize that if they are going to strategically pin their future here, they need to influence our strategy and those base building blocks.

Today, we have more than 3,300 members and we count more than 10,000 people who participate in our member activities. We are continuing to see deeper and broader investment from the multinationals who had their foot in the water with the microcontroller. They are now contemplating and taking a hard look at their processor plans. If they are going to do this at scale, and don’t want to be beholden to whatever strategic twists and turns another architecture is going to take, then RISC-V really provides them that freedom and that flexibility.

TPM: RISC-V is not risk free, but it is royalty free and open source and being developed cooperatively.

Which is why I think there’s a non-zero chance – even though no one’s talked about it, and I am going to say something that is potentially heretical – that the “Fugaku-Next” system coming out of Fujitsu and RIKEN Lab in Japan in maybe 2030 or so might be based on RISC-V, not Arm. RIKEN and Fujitsu learned how to do all of this neat vector acceleration on Arm chips, but there is no reason that can’t be ported over to RISC-V.

Processors are so expensive to develop that cutting checks for an Arm architecture license and paying royalty fees is going to be something everyone making Arm chips is going to be looking very hard at. If Linux and Android run the same on RISC-V, no one will care. This can all shift very quickly if it starts to move. . . .

Calista Redmond: Well, you know, people are still gonna be cutting checks. [Laughter] I mean, processors aren’t free. You need building blocks, and a good chunk of those building blocks can come from RISC-V International us and those are free. But if you want that differentiation, if you want that great performance, you are leaning on the engineering talent in the community and going in procuring some of that from any of the IP design houses.

TPM: I don’t know of anything at all in this world that is truly free. Linux is only free if you can self-support. Puppies and kittens, they’re free, right?

Calista Redmond: [Laughter] It’s just that I’m constantly telling people that RISC-V is not free. It has no barrier to entry, and it has no barrier to where you can go. The barriers are removed on both ends. If you want global markets, if you want to play on the global stage, you need to take an open, collaborative approach at your base.

TPM: You also need software. What is RISC-V doing to make it easier to move from X86 or Arm or – sorry to take a jab at your former boss here – IBM Power? Google programmers can just press a button and have its monstrous build engine – I am going to nickname it “Barf,” as a complement to the “Borg” cloud controller and job scheduler – output code tuned for whatever architecture, be it Intel X86, AMD X86, Arm, or Power. But not everyone has a magical build system powered by heaven only knows how many PhDs. [Laughter]

Calista Redmond: Luckily for us, there are market forces and it makes business sense to run on more than one architecture. So if your operating system or middleware has some merit to running on multiple architectures, you are already doing that. And if you’re already running on two architectures, it’s not a huge stretch to then start running on RISC-V. You have seen that come with Android, although I know you didn’t want to talk about. . . .

TPM: True, true, true, but to use a datacenter example, we see it happening with Ampere Computing’s Altra Arm server chips, and it has obviously happened with Graviton inside of Amazon Web Services. Google, Microsoft, and Oracle are using Ampere Computing’s chips, too, in their clouds and very likely on their internal workloads.

Calista Redmond: Those same market forces are showing that it is not in anyone’s best interest to go and support your own custom stack for anything. So where you can collaborate, that’s going to be in your best interest. So we see a lot of that going on, which by the way, helps prevent fragmentation of the RISC-V architecture because no company or country really wants to deal with that or can afford that.

Last week, in our board meeting, we just ratified the first three RISC-V profiles. Think of profiles as your base building blocks, a composition of things that everything must run if you want to be compatible to a profile. We already had compatibility to the base ISA. But here’s a common set of the ISA plus extensions that will be used across different forums to make things portable across RISC-V implementations, and potentially across RISC-V and other types of architectures.

TPM: Sounds like the Arm Server Base System Architecture stuff that Jon Masters was so crazy about . . . and we love you, Jon.

Calista Redmond: Yes, yes. And hopefully, Jon Masters will get just as positively crazy about what we’re doing here, right?

TPM: Isn’t that pretty much guaranteed?

Calista Redmond: The same no-vendor-lock-in world where Linux grew up, it’s where RISC-V is growing up, it’s where we’re investing more and more in our ecosystem. In fact, we just hired a director of ISV ecosystem last week. And so we’re looking at how many ways can we make the on ramp to RISC-V even easier. The things we are working on will be coming through as ratified profiles, and then the next step will be platforms.

TPM: I live in a universe now where I look out a couple years and I see server revenues breaking down maybe Intel 40 percent, AMD 40 percent, and Arm 20 percent. That is absolutely conceivable as far as I’m concerned, given the competitive pressures that are going on right now. And Intel can save its cookies all at wants but that might be all of the cookies it’s going to get.

If I look further out, maybe 2030 and beyond, the server CPU architecture revenue split could be 30-30-30-10, with RISC-V being the 10 percent. Or maybe it is 25 percent each for Intel, AMD, the Arm collective, and the RISC-V collective across a much, much bigger revenue base. You are piloting this thing. Are those numbers reasonable?

Calista Redmond: I think that’s within sight, I think that’s very reasonable, and I hope we over-achieve. That’s just the rate and pace of maturing of an ecosystem, maturing and hardening of technology and ensuring that everyone feels comfortable with the strategic choices that they are making. So those are choices they are making today to hit the future five to ten years out.

Within three to five years, you will see a lot more public disclosure of what’s being done now. And probably before that, you might see what’s coming through the fabs.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

9 Comments

  1. I think the cloud vendors will really drive the new architectures. If we think of cloud as a “server”, then I might care about the underlying CPU – although writing code in Java / Python / Javascript / etc should make this irrelevant. If we think of “services”, then not only don’t I care about the CPU, I can’t even find out what it is. Example, if my provider has hosted MySQL / message queues / object storage / load balancer / etc, then I only care that it works

    • I care about how everything works, not just that it works, because I need to worry about efficiency. I think we should care, particularly in a world where cash is more expensive and so is space and electricity. I get ease of use and ease of programming, and I think that this is valid as well, but somewhere someone needs to make sure Java and Python and MySQL are working optimally and designing machines that do this.

      • Thanks for invoking the Gabriel Electric Chair Principle (Garcia Marquez, not Richard P./LISP I think) whereby, without performance and efficiency, the system doesn’t scale right, and peak execution causes brownouts in surrounding communities. Still, the other Gabriel takeaway (from his 100 years of analysis, in solitude) is that computational diversity is key to robust longevity, to prevent potential pig-tailing. Well-executed RISC-V/VI could thus contribute to such sustainability in the long-term (but then again, not a 5GHz RISC-V that yields 13,000 CoreMarks, when a 3GHz M1 cranks out 30,000…).

      • I think that if ARM is smart and not being driven by short term thinking (IPO, making next quarter’s wall street expectations, etc) then it could still do really well in the server space.

        RISC-V will also do well, especially if they work with the LLVM and GCC people to make sure the compiler and tool chain support is there to make porting code from x86_64/ARM to RISC-V is simple and easy to do. Less and less of the Linux kernel is based on assembler any more, so it’s easier to port.

        But! what happens to RiscV (or even ARM?) when the next Spectre type security issue pops up? Are these architectures young and flexible enough that they can learn from the x86_64 mistakes and bake in solutions while at the same time scaling up in terms of core counts?

        But no matter what, the fact that there’s choice will drive innovation and new thinking which is a good thing

      • “somewhere someone needs to make sure Java and Python and MySQL are working optimally and designing machines that do this.”

        But does that someone have to be Timothy Prickett Morgan? Probably not. I can see choosing your vendor based on how good a job “someone somewhere” is doing and maybe being able to read up on benchmarks for general characterization but better it not be your job, unless it’s your job. My two cents.

  2. Right now there exists the Sophon SG2042 RISC-V SoC with sixty four THead C910 cores running at 2.0 GHz, with a pipelined vector unit (not RVV 1.0, but very similar). Each core is a little faster than a 1.5 GHz Arm A72 in the Raspberry Pi 4 — or 60% the speed of Amazon’s Graviton 2, at least on one quick benchmark (https://hoult.org/primes.txt).

    It’s real, it works, I have access to one.

    The C910 is a 4 year old core, announced July 2019. The same C910 core is on a $100 quad core SBC (Sipeed Lichee Pi 4A) expected to go on sale on April 15. That’s just how long things take to go into production.

    We’re going to see much faster RISC-V cores — already announced or pre-announced — going into production over the next several years. Beating current Graviton I’m sure, and perhaps near to Apple’s M1/M2.

  3. The x86 vendors are successful in the server business precisely because they are able to sell the same design in client. The NRE is amortized. In the case of AMD, they even sell the same CCD in both markets. Whatever little ARM has in the clouds — most of which is Graviton — are their own design, whose NRE is also amortized in the client (mobile) space. Amazon is cleverly licensing ARM IP, not trying to build the world from scratch. I don’t think the volumes will support such a decision even at Amazon. In fact, no OEM is doing custom ARM server CPUs anymore: you won’t have enough volume to make enough money to support the second, third, fourth revision. Not that long ago, you had Marvell (née Cavium, née Broadcom, née Netspeed), Qualcomm, Cavium’s original thunderX, plus a bunch of internal projects, later cancelled. RISC-V is going through the same, oblivious to, or deliberately ignoring the lessons of not just ARM custom CPU, but also the original RISC server manufacturers that got eaten up by x86 (SUN, DEC Alpha, PA-RISC, Itanium, MIPS/SGI…). A server-first design has not succeeded in the last three decades.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.