In this era of hyperscaler and cloud builder titans, their seven of whom account for about half of the IT infrastructure bought in the world, it is important to remember the importance of niches and the vital role that other makers of systems, other sellers of systems, and other renters of systems all play in the IT ecosystem. Never has a niche player been so important, and never has it been so difficult to be one.
And yet, this is the path that Lambda – which no longer calls itself Lambda Labs because it is a fast-growing business now, not some experiment in building AI systems and running an AI cloud – has chosen for itself. As niche players ourselves here at The Next Platform, we of course honor that choice, which is not an easy path. If you can’t be biggest or you can’t be first, then the only choice is to try to be the best. And really, there is no try, as Master Yoda correctly points out. There is only do or not do.
If you had to pick a niche, this is a pretty good one. Machine learning training models are growing exponentially and so are the datasets on which they depend, and the performance of the systems that do the training are not keeping up. Which means customers have to buy more and more machinery even as that gear gets more powerful thanks to Moore’s Law advances in parallel compute, memory bandwidth, and I/O bandwidth. Any time the software infrastructure is growing faster than the hardware infrastructure – the proliferation of batch processing on mainframes in the 1970s, the relational database revolution of the 1980s, the commercial Internet boom of the late 1990s and early 2000s, the data analytics explosion of the 2000s, and the AI revolution of the late 2010s and the early 2020s – this is a very good time to play to a niche with specialized hardware and software and engineering expertise to make it all hum.
We did an in-depth profile of Lambda back in December 2020, when we spoke to Michael Balaban, co-founder and chief technology officer at the company, and in May this year we looked at some price/performance metrics Lambda put out pitting its Lambda GPU Cloud instances based on Nvidia A6000 GPU accelerators against Nvidia A100 GPU instances running in the Amazon Web Services cloud. Lambda’s point was that the Chevy truck GPU is good enough for a lot of AI training workloads and superior to the Cadillac model in some cases. At this point, Lambda does not care about inference, and there is no reason why it should. The company wants to build AI training infrastructure. Full stop. Inference is presumed to run on in-house infrastructure, and that can be anything from CPUs to GPUs to FPGAs to custom ASICs, and Stephen Balaban, co-founder (with his older brother) and chief executive officer of Lambda, is not interested in selling inference systems. At least not yet, but that can – and we think probably will – change. But it is important for startups to stay tightly focused. You should not trust those that don’t, in fact, because money and time are both in short supply.
Lambda wants to ride that AI training wave with not only specialized hardware, but also by creating its own AI cloud built on its own hardware and its own software stack, called the Lambda Stack obviously, that is tuned up by its own software engineers. Lambda has recently secured $15 million in a Series A funding round plus a $9.5 million debt facility, giving it the funds to support its own explosive growth. The Series A was driven by 1517, Gradient Ventures, Bloomberg Beta, Razer, and Georges Harik – most of whom were angel investors when Lambda got its start nine years ago – and the debt facility came from Silicon Valley Bank.
We took the opportunity to talk to Stephen Balaban, the CEO, about how the company is doing and what it sees going on out there in what is still a very much nascent AI training market.
“Unlike other clouds and other system providers, we focus on just this one particular use case, which is deep learning training,” Balaban tells The Next Platform. “Our product base scales from laptops to workstations to servers to clusters to the cloud, and we are vertically integrated across those devices with our own Lambda Stack. But there is another aspect to this, and customers need to ask themselves if they really need the gold-plated datacenter service experience of having Amazon Web Services be their operations team managing the infrastructure, because that is very expensive, as you can imagine.”
In effect, Lambda builds the kind of cloud that you would probably like to build yourself, if you had the skills. It is engineered not to use the most general purpose GPU compute engines, as the public clouds have to do given the diversity of their workloads, but rather those GPUs that have sufficient parallel compute, sufficient memory capacity, and sufficient memory bandwidth at the lowest price to drive down total cost of ownership. When workloads are running in a sustained fashion, you have to drive down TCO in a world where models and data are growing faster than capacity increases. The public clouds have to massively overprovision their general purpose machines and then sell you on the idea that you should run your spikey workloads there and then charge you a high premium for the privilege of doing so. It is better and cheaper, says Balaban, to run your AI training on your own hardware (made by Lambda of course) and then process bursts on the Lamba Cloud, which is cheaper than AWS or Microsoft Azure.
So far, this niche play has worked well, and it is one that Lambda had to come up with because as a pioneer in AI software, it could not afford to run its AI applications on AWS without going broke because of the instant and explosive popularity of the AI tools it put out on the Web. The Balaban brothers learned the hard way that success is sometimes harder than failure, and that is how a niche hardware business and a niche cloud was formed.
What Lambda is doing is clearly resonating with organizations that are trying to get a handle on AI training and put it into production. In 2017, after being in business as an AI application maker and homegrown cloud builder to support them, Lambda had its first full year of selling AI training systems and pulled down about $3 million in revenues from that hardware. Two years later, it grew to $30 million, and in 2021, another two years later, it is on track to do $60 million.
“We have found that there is a huge demand – and a growing demand – for deep learning training systems that just work,” says Balaban, and that Series A funding is about building out the hardware and software engineering teams and the sales teams to really see how big that addressable market for Chevy systems is compared to the Cadillac systems that the big cloud builders have to engineer – and charge for – because they need to support a diversity of clients and workloads on their devices where Lambda simply does not.
Software is going to be a key differentiator, and the Lambda Stack, which is packaged up to run on Debian Linux and includes Nvidia drivers and libraries such as CUDA and cuDNN as well as the TensorFlow, Keras, PyTorch, Caffe, Caffe 2, and Theano machine learning training frameworks. With the fund raising, Lambda will be expanding the software that runs on its cloud and making it all a lot more user friendly than these frameworks (many of them developed by hyperscalers and cloud builders who seem to like byzantine and bizarre structures as a matter of pride) are when they are released into the wild on GitHub. Eventually, this polished AI training stack will be available for Lambda customers to deploy on their laptops and workstations, on their in-house clusters, and on the Lambda GPU Cloud.
And that is the secret right there to the niche. The experience will be the same for Lambda’s customers no matter where they are creating their AI models. They won’t even know the difference. The market will tell Lambda just how valuable such an experience is, but we can infer it from the actual experience of Apple with its music business, can’t we?