OpenAI Declares Its Hardware Independence (Sort Of) With Stargate Project

The dependency dance between AI pioneer OpenAI and the Microsoft Azure cloud and the application software divisions of its parent company are fascinating to watch. GPT-4 training on the Azure cloud and released as an API chatbot in November 2022 has been the Bunker Hill of the GenAI revolution, and Microsoft has invested $13 billion in OpenAI so it could afford to train its ever-embiggening and ever-ensmartening GPT models on the Azure cloud.

This round-tripping deal was perfect for 2023 and maybe for 2024. Microsoft got access to cutting-edge AI software, which it woven into its applications and services as Copilots, and OpenAI got access to vast AI hardware to train larger and larger models and to run inference for its API services.

The only trouble is that Microsoft has to make money selling AI software and services as does OpenAI, and as it turns out, commercializing GenAI has proved to be a challenge for just about everybody except Nvidia and the cloud partners selling hope in the form of GPU capacity to the world.

This pressure to make AI training and inference less costly so it can be more widely adopted explains the Stargate Project announced by OpenAI, SoftBank, and Oracle at the White House yesterday, which seeks to give OpenAI access – and we think eventually others – to state of the art AI datacenters crammed full of accelerators and switching to link them together.

Everyone caught wind of the Stargate Project back in April 2024 when The Information reported the rumors it was hearing about the effort, which was spearheaded by OpenAI. The rumors of the Stargate effort, which at the time seemed to be a progressively aggressive buildout of GPU compute with millions of GPU accelerators installed and a budget of somewhere between $100 billion and $115 billion over six years between 2023 and 2028 inclusive. Interestingly, the Stargate rumors hit about the same time that rumors were going around that OpenAI was looking to make its own AI accelerators.

It is not clear that the Stargate Project as announced yesterday includes a homegrown CPU and GPU compute complex with homegrown coherent interconnects between these devices akin to what Nvidia has delivered in its “Blackwell” NVL72 rackscale system, but when this eventually comes to light we will not be surprised. There is every reason to believe that OpenAI will eventually break free of Microsoft and then Nvidia and have its own hardware stack – or, at least try to make it cheaper than it can rent it.

But in the meantime, OpenAI has to be practical about the Nvidia GPU platform today just like it had to be practical about the Azure platform four years ago as the GPT foundation model started having what can honestly be described as emergent behavior. That means that Nvidia, which has the most complete AI platform on Earth and the lion’s share of revenues and profits from the GenAI revolution so far, has to be part of the Stargate effort at the beginning.

The Stargate Project starts with money. Masayoshi Son, founder of Japanese conglomerate SoftBank, which not coincidentally owns the lion’s share of chip designer Arm Ltd and all of AI accelerator designer Graphcore, is one of the equity investors in Stargate. It is not clear how much of the $100 billion investment in AI technologies in the United States over the next four years that Son promised to President Trump back on December 17 is being allocated to Stargate.

OpenAI, of course, is kicking in some of its own money, and so is Oracle, which will be helping to run the Stargate datacenters that were already being built in Texas ahead of this announcement at the White House. MGX, a private equity firm based in the United Arab Emirates – not to be confused with the MGX which is short for the modular server platform from Nvidia that comprises machines like the NVL72 – is kicking in money as well.

MGX is also not to be confused with Group 42 Holding, the full name of G42, which is another AI investment vehicle of the UAE that has invested heavily in AI hardware and waferscale computing upstart Cerebras Systems in recent years. But it would be easy to understand any confusion one might have between G42 and MGX.

G42 was founded back in 2018 and Peng Xiao is its chief executive officer. Xiao was chief technology officer of analytics platform provider MicroStrategy for a decade and a half. Tahnoun bin Zayed Al Nahyan, who is the son of the founder of the UAE and also its national security advisor, deputy ruler of Abu Dhabi, and brother of the president of the UAE, is chairman of G42 and also chairman of MGX. Other members of the Emirati royal family are on the MGX board as is Xiao. Ahmed Yahia Al Idrissi is MGX’s chief executive officer, and was previously in charge of direct investments at Mubadala, the investment fund set up by the UAE that, among other things, created the GlobalFoundries chip foundry from the chip factories owned by AMD, IBM, and Chartered Semiconductor.

Suffice it to say, representatives of MGX were not at the podium alongside President Trump at the White House, while Son as well as Larry Ellison, co-founder and chief technology officer of Oracle, and Sam Altman, co-founder and chief executive officer of OpenAI, were. It is not clear how much of the $100 billion of initial investment and the $500 billion expected to be invested in Stargate over the next four years will come from the UAE. It could be a lot, given the desire by Middle East oil giants to diversify their investments. A decade ago, it was chips. Now, it is AI datacenters.

Back in September 2024, MGX hooked up with Microsoft and private equity firms BlackRock and Global Infrastructure Partners to invest in datacenters and AI infrastructure for them, starting with a $30 billion pile of cash from investors, asset owners, and corporate partners, with another $70 billion in debt financing made available for AI investments.

The statement put out by OpenAI says that SoftBank and OpenAI are “the lead partners for Stargate, with SoftBank having financial responsibility and OpenAI having operational responsibility,” but it is not clear what that means. Son will be chairman of Stargate, but a chief executive officer was not named. Presumably it will be Altman. That statement also says that Arm, Microsoft, Nvidia, Oracle, and OpenAI are the “key initial technology partners” for the Stargate effort. But we wouldn’t count on OpenAI depending as much on Microsoft Azure going forward, and the same holds true of Nvidia hardware over the longer term.

“OpenAI will continue to increase its consumption of Azure as OpenAI continues its work with Microsoft with this additional compute to train leading models and deliver great products and services,” the statement says.

Well, yes. But with Nvidia getting somewhere on the order of 85 percent to 90 percent operating margins and then the clouds getting anywhere from 65 percent to 70 percent operating margins on top of that when they sell GPU capacity in the cloud, we don’t think OpenAI is eager to pay rent anymore. Why else would it bother with Stargate?

Stargate could also be the equivalent of an Amdahl coffee mug. Back in the day, when IBM had an actual monopoly on mainframe processing in the datacenter, the creator of the System/360 hardware, Gene Amdahl, left Big Blue and started a company that made clone hardware that ran IBM’s MVS, VM, and VSE operating systems and related systems software. If you wanted to get a reasonable discount on your next mainframe deal, when the IBM sales rep came to visit for a renewal, all you had to do was drink coffee at the meeting with that mug.

At the very least, we think, Stargate is designed to convince Microsoft to give OpenAI better treatment for whatever capacity it buys from Azure. But in the end, with the money from the investors in Stargate and the presumption that they are wailing to take a little bit lower margins than the big clouds are raking in from GPU capacity, OpenAI will gradually shift over to Stargate datacenters in Texas and then in other locations the company decides. (North Carolina, where power is cheap and water is plentiful, would not surprise us.)

And over time, if OpenAI can work with Arm and SoftBank to design its own hardware and tune it specifically for its own GPT models, it is not hard to envision Nvidia GPU capacity being rented out to help the Stargate partners make money and thereby make it possible for OpenAI to get cheaper homegrown iron when it is ready for primetime.

There are a lot of variables in that scenario, of course.

Based on statements made by Nvidia last year when it talked about the return on investment of building GPU cloud capacity, and out tweaking of it to reflect what we think are more realistic costs and revenue streams, for every dollar you spend on AI infrastructure, it takes another dollar to build an AI datacenter and power and cool that datacenter. (About 80 cents is spent on Nvidia GPUs and 20 cents is spent on networking and storage on the AI hardware side of that pie.) That works out to $1.5 billion for 16,000 of the ”Hopper” H100 GPUs in the math that we did at the time; Blackwell systems will have slightly different math. If you rent that 16,000 GPU cluster out with a reasonable mix of on demand and reserve instances, you can charge about $5.3 billion over the course of four years, and using Nvidia technologies can drive up the utilization making that investment yield better, too. So, $1.50 in, $5.30 out. That’s a pretty good business.

Altman wants to build a cloud and not pay all of that profit overhead to Microsoft, but to get the cloud built it is going to have to pay some margin. The same holds true, we think, for AI hardware. This is why all of the hyperscalers and big cloud builders are designing their own CPUs and AI accelerators for the datacenter and not trying to take on Nvidia by making a GPU, which is a much more general purpose device. The premium on a GPU at the moment is somewhere around $40,000 for a Blackwell B200 versus something like $25,000 for a TPU designed by Google and made by Broadcom in conjunction with Taiwan Semiconductor Manufacturing Co. With the B300, the gap will probably get larger, maybe as high as 2:1 if the B300 sells for $50,000.

One might reasonably ask why Nvidia would be in this deal. Well, when all of your biggest customers are also trying to be your biggest competitors inside of their own datacenters, you have to keep finding new ways to get your product to market. And more importantly, you have to sell the GPUs you have to the customers who can deploy them the fastest and buy the highest number of them. OpenAI is already dependent on Nvidia GPUs, unlike Google with its TPUs and Amazon Web Services with its Trainium and Inferentia AI accelerators.

It remains to be seen how well Microsoft and Meta Platforms will do with their respective Maia and MTIA AI chips, and whatever ByteDance is cooking up through its partnership with Broadcom for an AI accelerator. ByteDance already buys Ascend and Cambricon AI chips from Huawei Technologies for AI inference, and is rumored to be working on an AI training chip of its own since it can only get crippled GPUs from Nvidia or AMD due to export controls put in place by the Biden administration.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

1 Comment

  1. Spot on! And yet rather soap-opera-esque … if I understand well, OpenAI found itself a wealthier new “suitor” who promised it the Stargate, but it also wants to remain friends with its ex, Microsoft. And meanwhile, its ex-ex, or former ex, now of xAI, is building its own Colossus dream castle, while suing OpenAI for lying about being open during their past romance, and it is also stating that its new suitor is not even that rich anyways (appearing obsessively jealous in the process)! Keeps me at the edge of my seat, wondering who will end up with who, when, where, and for how much? Particularly seeing how, as far as openness is concerned, a new potential romantic partner migh just have hit town, looking to exotically challenge OpenAI’s o1, on the large language catwalk: https://api-docs.deepseek.com/news/news250120 8^b

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.