Meta Buys Rivos To Accelerate Compute Engine Engineering

Of all of the hyperscalers and cloud builders, Meta Platforms has always been the one that we expected to design and have manufactured its own CPU and XPU accelerator compute engines. The reason is simple. Clouds have to buy X86 CPUs and Nvidia GPUs because that is the iron that enterprises and startups want to rent. These days, many also want to rent Arm CPUs based on Neoverse IP blocks, but not because they love Arm software but because they love the 30 percent to 40 percent better bang for the buck the clouds claim they offer for these homegrown CPUs.

Meta Platforms is not a cloud, and therefore, it doesn’t have to balance the needs of enterprise infrastructure renters against its desire to control its own infrastructure fate and develop compute engines (and indeed, switching and storage) that are specifically tailors for the needs of the 3.5 billion users of its Facebook, WhatsApp, Instagram, Messenger, and Threads social media applications.

Some 85 percent of those users are on Facebook, so Meta Platforms is still a one product company but one where at least some of its users are expanding out to its other applications from that base. Which does not necessarily mean that Meta Platforms has it easy. These are all different kinds of workloads, and the company is keen on developing its own AI to enhance and drive those applications, much as every other company on Earth now has that desire.

Given the enormous amount of money that Meta Platforms spends on research and development and capital expenses – it looks like around $50 billion on R&D and somewhere between $66 billion and $72 billion for capex in 2025, against somewhere between $190 billion and $200 billion in revenues, so call it somewhere around 61 percent of revenues total at the midpoints of all of that – shaving even a few points off the cost of infrastructure can make a big difference in its profitability.

So it is not hard to see why Meta Platforms would want to design its own CPUs and XPUs at the very least and also ride the interconnect ASIC makers (we are including switching, routing, and memory interconnects in this bucket) to do what it needs and threaten to design its own switch and memory fabric interconnects if they don’t.

It is no secret that Meta Platforms wants to jump the licensable but closed source Arm architecture and move straight to the open source but still somewhat undone and not yet mainstream RISC-V architecture with its future compute engines. It has also been no secret that Meta has had its issues in bringing custom compute engines into being, and has yet to get a general purpose CPU or an AI training XPU out the door. This is a problem, obviously.

The company began its custom silicon efforts in 2020, and in May 2023 it launched the Meta Training and Inference Accelerator (MTIA) v1, which is not aptly named because it could do inference but not training.  And in April 2024, when the substantially improved MTIA v2 was launched, this chip could do inference better, but still no training. Both chips are built using arrays of processing elements that are based on RISC-V cores, specifically a pair of cores where one core does scalar work and the other core has vector engines that operate on integer and floating point data. The MTIA v1 was deployed moderately in the Meta datacenter server fleet, and the ramp was more intense for the much better MTIA v2.

The company released a paper in June 2025 at the International Symposium on Computer Architecture 2025 conference in Tokyo calling it the MTIA 2i, with the “i” being for inference, and in that paper claimed that for certain kinds of AI inference workloads used in its applications, this chip provided 44 percent lower TCO than using Nvidia GPUs on the deep learning recommendation models (DLRMs) that drive the Meta Platforms business. All of these models, which drive the company’s ad servers, have hundreds of gigabytes to single digit terabytes of embeddings, which makes it expensive to run them on GPUs. There is a reason why Nvidia created the Grace-Hopper and Grace-Blackwell hybrids – the Grace CPU is really a memory controller to store embeddings. But the biggest models at Meta Platforms have outgrown the Grace CPU’s memory by a fact or 2X to 4X, and this is a problem. (We only just found this paper and will be drilling down into it separately.)

The point is: Meta Platforms has a problem. It was already working with Rivos, one of the several RISC-V compute engine startups, to help it design its MTIA chips and maybe even a Meta CPU to pair with it when the company just decided to make Rivos an offer it could not refuse.

Rivos, which was founded in May 2021, was pretty secretive about what it was up to and it had a partnership with Meta Platforms where it apparently helped in the design of the MTIA 1i and MTIA 2i compute engines (using the more recent and descriptive way of talking about them). The exact nature of this collaboration was unknown. Separate from this, Rivos was working on its own RISC-V CPU and GPU designs.

Yup. We said GPU there.

Rivos will be to Meta Platforms what Annapurna Labs was to Amazon Web Services: The foundation of its future processor designs. The parallels are illustrative.

Annapurna Labs was founded in 2011 by Billy Hrvoje, Nafea Bshara, and Ronen Boneh, and funded by Walden International (as in Lip-Bu Tan of Intel CEO fame) as well as by Avigdor Willenz (of Habana Labs and Xsight Labs fame), Manuel Alba (of Astera Labs fame), Andy Bechtolsheim (of Sun Microsystems and Arista Networks fame), Arm, and Taiwan Semiconductor Manufacturing Co. In 2012, AWS did its first Nitro DPU designs in conjunction with Cavium Networks (now part of Marvell), and on the next generation of Nitro processors, AWS started working with Annapurna Labs. In 2015, as that work was progressing, AWS went all in and just decided to buy all of Annapurna Labs because it had caught custom compute engine religion, expanding to Graviton CPUs and Trainium and Inferentia XPUs.

The rumored acquisition of Rivos by Meta Platforms was confirmed by Walden Catalyst, one of the investment arms of Walden International, by its famous founder:

Walden International founder and chief executive officer Lip-Bu Tan and Rivos co-founder Puneet Kumar.

The wonder is why Intel didn’t buy Rivos and lay the foundation for an open source architecture and design business to complement its foundry. But that is another story. . . .

Rivos was founded in September 2021 with the help of Tan and Amarjit Gill, a co-founder of MIPS chip maker SiByte (bought by Broadcom in 2000) and Power chip designer PA Semi (bought by Apple in 2008). Significantly, the PA Semi team is the one that has created Apple’s custom Arm client chips, and Tse-Yu Yeh, one of the Rivos co-founders, spent more than 17 years at Apple, rising up from a senior engineer in architecture and verification to be senior director of CPU design. That has been his role at Rivos.

Co-founder Puneet Kumar, shown in the photo above, hails from the glory days of Digital Equipment Corp and was a member of the technical staff at the Digital Systems Research Center through the Compaq and Hewlett-Packard Enterprise acquisitions and was director of systems engineering software at SiByte before Broadcom ate it. After that, Kumar moved to PA Semi and was put in charge of software architecture, and stayed at Apple until 2009, when he took a vice president of engineering gig at Agnilux, a secretive chip startup including many PA Semi ex-pats that was acquired by Google in 2010. When Kumar left Google, it was to become CEO at Rivos.

Mark Hayter, another Rivos co-founder, followed a similar path from DEC to SiByte to Broadcom to PA Semi to Apple to Agnilux to Google, and was chief strategy officer at Rivos and system architect, in one form or another, in his prior work. The final Rivos co-founder is Belli Kuttanna, was a chip designer and architect at Texas Instruments, Motorola, Sun Microsystems, Qualcomm ever so briefly, and Intel before joining Rivos.

Agnilux might be to Google what Annapurna Labs was to AWS. Or, more precisely, vice versa, since Google would have started first and it looks like for client rather than server hardware.

With the backing of Walden International with the help pf Dell Capital Ventures and Matrix Capital Management, Rivos started off with more than a hundred employees on day one, and Tan was named chairman of the board. This has, in part, given Rivos access to advanced EDA tools and to foundry expertise and capacity at Taiwan Semiconductor Manufacturing Co. Hiring nearly 50 engineers from Apple in 2023 landed it in a lawsuit with Apple, and Tan negotiated a settlement. The company was working on CPU designs when this was happening, but Tan suggested the company to focus on AI acceleration, and its partnership with Meta is the result of that pivot. According to the statement from Walden, Rivos taped out a “3.1 GHz processor and built a CUDA-compatible software stack,” but we see no indication of this anywhere on the Internet.

Rivos raised $250 million in Series A funding in April 2024, and did another $120 million or so in additional funding. In August, The Information reported that Rivos was seeking $500 million in Series B funding, which would have pushed its valuation up to above $2 billion. Walden’s statement said that it was preparing to raise its next round of funding in early 2025, and along with the sniffing around process, Rivos had some offers for an outright acquisition. Meta Platforms made an offer Rivos and its investors did not refuse. It’s hard to guess where, but it would be north of $850 million with $370 million invested if an additional $500 million was kicked in and the after-money valuation was north of $2 billion.

There has been some bantering back and forth as to whether or not Rivos was working on a GPU. From the limited information available out there, it was working on a CPU and a GPU. Here is a rudimentary block diagram from the Rivos site:

And here is a comment from the company when it did its Series A funding:

“Rivos provides power optimized chips combining high performance server-class RISC-V CPUs and a Data Parallel Accelerator (a GPGPU optimized for large language models (LLMs) and data analytics) that work with today’s software programming models and rack server constraints. The tight integration of CPU and parallel computation sharing a uniform memory across DDR DRAM and HBM is ideal for today’s models and databases that need terabytes of memory.”

This seems pretty unequivocal. Rivos looks like it was creating a hybrid CPU-GPU compute system on a chip or package akin to the Grace-Hopper and Grace-Blackwell CPU-GPU “superchip” hybrids from Nvidia. And one that presumably was based on the RISC-V architecture on both sides and, importantly, one that was compatible with Nvidia’s CUDA-X software stack. CUDA-X is a parallel programming model coupled to a set of algorithms, libraries, and frameworks to handle the offloading of software from CPUs to be accelerated on GPUs. It is the moat that gives Nvidia great pricing power.

We think 3.1 GHz is a pretty high clock speed for a GPU, so maybe that was a CPU or maybe the Rivos GPU has some smarts we are not aware of that lets it run fast and hot. This is one of those unknown unknowns that can be so irritating.

What we do know is that in September 2024, Rivos chose Andes Technology, which we wrote about two weeks ago, to supply it with its NX45 RISC-V core. This is not to be confused with the “Cuzco” core from Condor Computing, the US Arm of Andes, which is based in Taiwan. The NX45 is a 64-bit in-order RISC-V core with a two-wide, eight-stage instruction pipeline. The Cuzco core does out-of-order instruction processing (like most RISC chips do these days) and an eight-wide, twelve stage pipeline. This NX45 chip is being used as an on-package controller, much as Nvidia uses homegrown RISC-V chips as controllers on its GPU accelerators.

We strongly suspect that the Rivos CPU and GPU make use of the RVA23 specification from RISC-V International, the standard bearer for RISC-V designs, which Rivos helped give vector extensions and other features for high performance.

We can also see that memory coherence across CPUs and GPUs and both DDR and HBM memory is part of the design, just as it is with the Nvidia “superchips,” as the company calls them.

One other thing. Since 2021, the Nvidia CUDA-X software licensing agreement has prohibited the use of translation layers that allow for compiled CUDA programs (binaries) to run on non-Nvidia hardware. Nvidia does not – and cannot – restrict the use of source-to-source translators like AMD’s HIP and Intel’s SYCL, which allow you to recompile CUDA codes to run on other hardware.

We do not know what approach Rivos was taking with its “CUDA-compatible software stack,” but this emulation thing is a sticky point, and it may end up being a legal one. And, it is also interesting to remind everyone that Nvidia’s long-ago “Project Denver” Arm server CPU was rumored to have X86 hardware emulation capabilities, which were obviously not added to the actual Grace Arm server CPU that Nvidia delivered a decade later.

Rivos could not afford such a battle with Nvidia. But, if it comes to it, Meta Platforms certainly can and a good lawyer could make the case that Nvidia is tying its software to its hardware to maintain a monopoly. Hell, a bad lawyer could argue that at this point, with Nvidia having a clear – and unregulated – monopoly on AI processing. Moreover, if Meta Platforms uses this CUDA emulation technology for its own private use and does not sell it, is it illegal? Again, this is a gray area. Meta Platforms would certainly would be deriving economic benefit from the use of said technology.

We look forward to seeing what Meta Platforms does with the Rivos team, and how fast it does it. Imagine a RISC-V clone of Grace-Hopper that was software compatible and half the price. . . . Meta Platforms could enter the systems business and clean up.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

1 Comment

  1. Correct me if I’m wrong, but the Transmeta code translation was not a matter of translating Intel-copyrighted binaries to run on non-Intel hardware, but instead was to run third party binaries not created by Intel. The legal avenue Intel pursued was that Transmeta was infringing on Intel’s patents to certain x86 code operations. The case of Nvidia’s restriction on the translation of its binaries is different. Aren’t the binaries themselves Nvidia’s copyrighted work? Wouldn’t an argument that Nvidia is maintaining a monopoly by not allowing the code that it itself created to run on the hardware of other vendors come down to the government forcing Nvidia to provide its own code to its competitors? That seems pretty extreme and very risky to the future of innovation. Nvidia’s code is a key innovation to the functionality of what it offers. It spent a lot of time and money on the code. And the reason it created the code was not to lock competitors out, but simply so that the hardware was usable for purpose. In the future who would create a hardware product that relies on self-made code if it were taken away from Nvidia? It just seems like theft resulting from no one willing or able to spend the time and money to do Nvidia’s work themselves (if they are unable, then they are unable to duplicate Nvidia’s product). A future company trying to innovate would need to find some way to intrinsicly tie the hardware and software together, adding a constraint on the development process that would certainly introduce inefficiencies.

Comments are closed.