SambaNova Pits LLM Collective Against Monolithic AI Models

There is more than one way to get to a large language model with over 1 trillion parameters that can do lots of different things and enterprises can use to create AI training and inference infrastructure to extend and enrich their thousands of applications.

One is to take a big bang approach and emulate what OpenAI has done with GPT-4 and presumably with GPT-5 and what Google has done with PaLM and presumably with Gemini. You create one big model that has trillions of parameters and you run the largest corpus of knowledge you can find through it and train it to do all kinds of things all at once. If you are clever, you can get such a monstrous model to only activate the pathways through the LLM that it needs to answer a particular kind of question or give a certain kind of output. That is where “pathways” in the Google Pathways Language Model, or PaLM, comes from, which is the predecessor to its current and somewhat controversial Gemini model.

Training such big models as GPT-4 and GPT-5 or PaLM or Gemini requires thousands to tens of thousands of GPUs or other kinds of AI accelerators and it can easily take three months for them to chew through the massive corpus and build all of the relationships that allow it to be prompted. While activation of the model for inference can be selective, training is all-encompassing, huge, and expensive.

The other approach, called a composition of experts by AI startup and upstart SambaNova Systems, is to take dozens to hundreds of pretrained models that are very good at doing very specific tasks and stringing them together in parallel and presenting the collective as it is was a giant model with over a trillion parameters and then farming out the work to only selected models and then doing a kind picking and choosing of the results from these experts and cross-feeding them into those selected models for verification and error correction.

This composition of experts approach is by no means new, and is a kind of mixture of experts technique that predates both deep learning and generative AI and that was a subset of various ensemble techniques that have been used in classical Bayesian machine learning and even HPC simulation and modeling applications for decades.

As Rodrigo Liang, co-founder and chief executive officer of SambaNova, explained to us last September when the “Cerulean-1” SN40L, the fourth generation of its reconfigurable data units (RDUs), were launched, building ever-enlargening models with trillions of parameters and trillions of tokens was not sustainable, and that a different approach that would have to be adopted was to have maybe 150 unique and pretrained models for different tasks in the enterprise and then put a kind of software router in front of this LLM collective to make it look like a single LLM.

The upshot is that it takes a lot less iron to train these models in the enterprise because they are already open sourced and are already trained and can be re-trained, tuned, and pruned for specific datasets in the enterprise that companies absolutely feel proprietary about. And it takes a lot less iron to support AI inference for these models because they are inherently smaller and they are not all activated at the same time, either.

We would observe that the former approach taken by OpenAI and Google happens when you are trying to see if you can get the Holy Grail of artificial general intelligence by just adding more parameters and more data against which to create weights while the latter is just trying to find anomalies in network traffic, automate customer support, write a product manual, or whatever practical task there is in the enterprise.

Frankly, this seems more akin to the way the human brain actually is designed — er, evolved. We have many different kinds of brains, which mostly work together to provide the right kind of response at the right speed – sometimes autonomously and automatically, like reflex responses, and sometimes with deep thoughts that take time.

The first iteration of SambaNova’s composition of experts model is not at the full extent that it expects to eventually span, but the 54 models in the Samba-1 collective encompass 1.3 trillion parameters in total.

“The 54 models inside Samba-1 were actually curated by our engineers, looking for the best and most accurate experts that enterprise customers want,” Liang tells The Next Platform. “We then put them together into a single end point. These base models are, through the open source community, trained to create high quality checkpoints. When Samba-1 runs, the actual model or models that we call to answer the prompt gets published, which gives customers visibility.”

With the GenAI movement being so young, we explained to Liang, the choices that SambaNova was making for models for specific kinds of tasks was important to list, as well as their relative sizes in terms of parameters that were used to do the training of those models. So SambaNova sent us a clunky spreadsheet with all 54 listed, and then a little bit later sent us this much prettier graphic that, while tall, outlines all 64 models and how they are used in the collective:

As you can see, LLaMA 2 models are heavily represented in this first iteration of Samba-1, with a smattering of Bloom, Mistral, and Falcon models. Interestingly, there are no OpenAI GPT 3.5 models. But remember, there are about 100 models to go in the collective before SambaNova reaches the 150 or so it thinks that enterprises will need, as Liang explained to us back in September. Maybe there will be pre-trained OpenAI models added at some point in the future (SambaNova did get its start on the early and open GPT models). But it is more likely that others from the Hugging Face galaxy of models, which weighs in at over 350,000 models and over 75,000 datasets at the moment, will be added. SambaNova is itself sticking to open source models, but enterprises do not have to do that. They can license other models and datasets to do their own training privately on SambaNova’s own gear or on cloud-based GPUs. You will also notice that there are often variations on a particular model that are tuned for speed or tuned for accuracy or tuned for a balance between the two.

“Here’s the thing,” says Liang. “I don’t have to trust one model that was biased in some way. I have 54 models. And as our roadmap continues to add more and more experts, you will have more and more access to different points of view that allow you to actually cross-check the answers that you are getting. If you don’t like what one expert is saying, you can get a second opinion, and you can actually have a separate expert come in and opine on their results and iterate further from there so that you create an answer. And so it’s actually quite exciting for us because it’s opening things that other people just can’t do, because these monolithic models by structure are biased in one place.”

The secret sauce in front of the Samba-1 model collective is known internally as “The Conductor,” which is the routing software between the prompt and the 54 models that knows, based on the prompt, which models to activate with input and to aggregate for output – and importantly for compute capacity and latency reasons, which ones to leave alone. This is not a big piece of software, says Liang, comprising maybe several thousands of lines of code – but certainly not millions or tens of millions like other pieces of systems software can swell up to. But that routing software is a tricky bit all the same, and we are dubbing it Router-1 because every product has to have a formal name.

It is not clear if SambaNova will open source this router so others can create their own composition of experts. Presumably not. But the idea is out there and if this approach works well, you can bet someone will start coding an open source LLM router.

Or, maybe people will just buy SambaNova iron and run it in their datacenters or rent capacity on the SambaNova cloud and just get to work.

That is what Liang thinks will happen, and he has set a goal for SambaNova to sell systems using the Samba-1 stack to 100 enterprises in the Fortune 500 this year. The target is really the Global 2000, of course, where all of the biggest money is spent in the IT sector outside of the hyperscalers and cloud builders.

Those hyperscalers and cloud builders are neck-deep in expensive GPUs and who are also building their own AI accelerators at this point. At the moment, SambaNova has dozens of paying customers and its revenues nearly tripled last year, and given this composition of experts stack and the fact that SambaNova believes it can get the same results as the monolithic models for one-tenth the cost (and about one-tenth the iron), this might be the year SambaNova starts riding up the hockey stick curve to hundreds of customers and hundreds of millions of dollars in revenues and then thousands of customers and billions of dollars in revenues.

HuMo says:

February 29, 2024 at 7:24 pm

Well, I could sure enjoy a choreography involving 54 Samba experts, harmoniously gliding across the cognitive dancefloor … more so than say, one very large expert Billy Idol “dancing with himself” (as if he’d over-indulged in a double-extra-large super-jumbo-sized DonutLM-v1, that merges delicious macaroni-7B, with yummy bagel-7B — if I read that pretty table well)! 8^b

SambaNova Pits LLM Collective Against Monolithic AI Models

Sign up to our Newsletter

1 Comment

Leave a Reply Cancel reply

Sign up to our Newsletter

Related Articles

Number Three Has To Work Very Hard In The Cloud

How Did DeepSeek Train Its AI Model On A Lot Less – And Crippled – Hardware?

Oracle Has Some Big Advantages To Mainstream AI

1 Comment

Leave a Reply Cancel reply