Talking AI Costs And Addressable Markets With SambaNova

The only way to accurately predict the future is to live it, but just the same, prognostication is one of the things that we humans love to do. It helped us stay alive all of these millennia, presumably because we are right more than wrong when we run scenarios in our heads.

A very large number of people on Earth right now are trying to figure out how generative AI and its large language models are going to change our work and home lives, and it is as easy to be optimistic about the future as it is to be pessimistic. Just like it was when computing first came to the enterprise six decades ago, or the commercial Internet wrapped around existing back-office systems and gave them a front end linked to the world three decades ago. It has been three decades, and perhaps a revolution in IT was overdue.

This GenAI one is a doozie, you have to admit. And to try to get our brains wrapped around it, we are talking to as many people as possible to get their sense of what is happening on the ground today and where it will take us tomorrow. To that end, we sat down with Rodrigo Liang, co-founder and chief executive officer of SambaNova Systems, one of the several upstarts that has created AI engines and has established a beachhead from which it can attack the GenAI opportunity.

Timothy Prickett Morgan: We talk about technology a lot here at The Next Platform, but this time I want to start with total addressable market.

You have no doubt seen the massively upwardly revised TAM projections AMD made at the launch for the “Antares” MI300 GPU accelerators back in December, with the company nearly tripling the TAM for datacenter GPU accelerators to at least $400 billion by 2027, representing a compound annual growth rate of more than 70 percent between 2023 and 2027 and representing, we think, somewhere between $600 million and $800 billion in server spending. Back in October, by playing around with a server revenue forecast model from IDC, we came up with a forecast that AI servers might represent half of nearly $200 billion in server spending by the same 2027. We also took a hard look at the GPU revenue forecast from Wells Fargo and added a shipment overlay to it, and we think there might be somewhere approaching 15 million in annual GPU shipments against close to $100 billion in GPU revenues by 2027.

These are all very big numbers. And my take on GenAI is that enterprises will find ways to make do with many smaller models and leave the pursuit of general artificial intelligence to the hyperscalers who can afford to play that game. We just talked to SymphonyAI, who we just wrote about and who is qualifying its stack on SambaNova systems. They have thousands of real customers and they are doing a lot of work with LLMs that have 7 billion or fewer parameters, and we think LLMs with hundreds of millions to a billion and parameters will be common in the enterprise. The cost of inference has to come way down, all of these costs have to come way down.

What do you think about all of this? What’s the TAM, man?

Rodrigo Liang: Here are the trend lines that somebody way smarter than me can model. [Laughter] The relative price of AI devices are going up, and Nvidia has already proven that. Where does it stop? We shall see. It is actually something that you could see happening because the cost of innovation was going up very, very quickly. The semiconductor industry from fifteen years ago was already not sustainable. You could see it was not sustainable.

TPM: Look at the volumes of datacenter GPUs we are talking about: Soon to be 10 million a year, and 15 million a year not long after that, compared to what, maybe 25 million to 30 million server CPUs.

Rodrigo Liang: So for AI chips in general – and today the GPU is the dominant one, but I think there will be other architectures out there, including ours – there is a thought, which we certainly will subscribe to, that AI computing will be the dominant computing workload in the future. Well, why?

TPM: Just like Web infrastructure started to dominate three decades ago. When I started out as a pup reporter in the IT sector, there was no freaking commercial Internet. It was just a bunch of nerds at university text messaging each other from the same room and sharing files and workloads over a primitive backbone. Most of corporate computing was transaction processing on big iron, and this shell of Web infrastructure that was 3X, 5X, and then 10X the capacity of the transaction processing machines wrapped around it. And the relational database system slice is still pretty big, but the rest of the infrastructure, now including datalakes and data warehouses as well as AI infrastructure, is all wrapped around it. It is how we got from $50 billion to $250 billion in datacenter sales, to throw some round numbers at it.

But $400 billion in AI accelerators, with maybe systems worth around twice that, we have a hard time seeing that in the next four years. . . .

Rodrigo Liang: We have known each other a long time. So I will present a corollary to that. Back before the commercial Internet showed up, IBM, Sun Microsystems, Hewlett Packard and others were building big database machines. What would you have said about the value of Web-based services based on the pre-Web data?

I think we created workloads that we didn’t even imagine before, at a scale we grossly misestimated. I was at HP at the time, and we thought that what drives the need for more compute was OLTP transaction processing. And if you look at how the Internet was going to scale this faster, we said, okay, double it. Well, we were off by 5X. It was an order of magnitude.

TPM: But that only happened because the price of compute was coming down fast. X86 systems were very inexpensive by comparison to a RISC/Unix box, and Web applications were not just built to scale over them, but to be resilient thanks to them. Today, if you want to train a top-notch LLM with trillions of parameters in three months, you need $1 billion, and $850 million of that is going to Nvidia. I don’t think those kinds of numbers are sustainable. We need to find a cheaper way to get the same result. This feels like trying to build the Internet on mainframes when even RISC boxes were too expensive.

Rodrigo Liang: As with any other technology, you are going to see other players come in to help bring the costs down. And because we are hitting a point where there is enough adoption across the marketplace that you do need alternatives that can significantly reduce the cost in order for him to continue penetration.

SymphonyAI is a great example of a partner of ours that kind of is thinking like you are. That new applications drive out old ones. This is why Google was so nervous about OpenAI, aside from actually doing the all the fun generative things in generating documents. Doing all these things is replacing search. Why do I need to cache all those things in different layers of a search engine application if it’s already learned in the model?

So there is a different line that’s getting drawn as far as where value is captured that isn’t previously only captured in the way the server revenue was being calculated. There are other players in software and other systems in order to power all of it, which is now all getting integrated into the single LLM.

We are powering a lot of these systems, replacing all sorts of hardware systems that they already had to do to provide services. Software systems that companies no longer need because the LLM just does it for you. So its not just a dollar for dollar replacement for servers. Actually the entire solution is getting ripped out and we’re replacing it with a much, much more efficient and much, much cheaper way of actually doing certain things.

TPM: The cost of AI training is very high these days, obviously, but the cost of inference seems to me to be the limiting factor in widespread adoption of these technologies. How is this going to play out?

Rodrigo Liang: In the steady state sometime in the future, the cost of inference will dominate and that needs to collapse by an order of magnitude or more. And I think most people are just coming to that understanding.

TPM: When it was a much lighter grade of machine learning, we could believe that a lot of the inference can be done on a CPU with a bunch of matrix engines or an Nvidia T4 instead of an Nvidia A100 or H100, and we said as much about a year ago.

But once we got GenAI, and for latency and compute capacity reasons, you need eight GPUs or even sixteen GPUs to run the inference for a chatbot, and it won’t be long before it’s 32 GPUs to do the inference, that’s crazy town. That takes everything I thought and throws it out the freaking window.

Rodrigo Liang: This is a great opportunity for innovation, and the need – the demand – is there.

TPM: Does the cost of inference, or the amount of inference needed for compute, or some mix of the two, have to come down by 10X or 50X? I don’t think it is 100X for LLMs to be affordable in the enterprise.

Rodrigo Liang: I think you have to look at the totality of what it takes to power these things. But in inferencing, our belief is we have got to start with a reduction in cost of 10X. What we think is that over time, as modern technology matures, you are going to see this kind of cost become the anchor

TPM: Is there a perfect or almost perfect elasticity scale of demand here? In other words, if the price comes down, will people find proportionately more uses for AI and the aggregate revenue will stay as high as we think it might be? Because I think use cases are just about everything people and systems do today, which might as well be infinite.

Rodrigo Liang: We are in a competitive cycle for enterprises, and for most enterprises, it is an existential question of what happens in this GenAI phase. That somebody in their industry is able to use GenAI to create better products and services in this chasm. This is not evolution, it is a chasm. And in that mode, you are not thinking about costs, you are thinking about can I survive this chasm. And that is why we see today that a lot of companies are betting very aggressively, because they believe that next three to four years is going to be where a lot of the competitive landscape for their business will get settled. Lots of people are thinking the order of positioning within their industry has not changed for twenty years. But this is the opportunity for some companies to really change that ordering. People are going to be very aggressive. And the people who figured out how to use GenAI in a competitive way will settle into a top position in the market.

Hubert says:

February 15, 2024 at 9:58 pm

Very nice interview of SambaNova’s Rodrigo Liang! I really like how he frames the stakes for the industry in those words (I’m broadening it a bit from the search-orientation given in the interview):

“Why do I need […] all those [software] application if it’s already learned in the [AI] model?”

Indeed, nobody can afford to be left behind, or out, of this tech, where it applies, leading to “ripping-out” or “replacing” old solutions for more efficient and cheaper ones, over the related “chasm”. Which brings the second important point (IMHO):

“people who [figure] out how to use GenAI in a competitive way will settle into a top position”

This will demand quite a bit of creativity and innovation, but rewards should be substantial. Just today, France’s minister of the economy (Bruno Lemaire) said he wants the country to be a worldwide leader in AI/ML … other countries are the same, and this should open up plenty of opportunities for state-of-the-art hardware developers (like SambaNova, PasoDoble, QuickStep, HarlemShuffle, and TangoNuevo — ah-ah! — I just love the name!).

Talking AI Costs And Addressable Markets With SambaNova

Sign up to our Newsletter

1 Comment

Leave a Reply Cancel reply

Sign up to our Newsletter

Related Articles

Big Blue Can Still Catch The AI Wave If It Hurries

Combining AI With HPC To Find Better Battery Designs

Extended “Blackwell” GPU Ramp Cools Growth At Supermicro

1 Comment

Leave a Reply Cancel reply