Brijesh Tripathi’s early career took him on a path that wound through Nvidia to Apple to Tesla to Zoox – an autonomous driving company now a subsidiary of Amazon. All of these experiences gave Tripathi valuable lessons in everything from hardware engineering and manufacturing to understanding the evolving datacenter landscape and ensuring that customer needs are met. He learned that designing and building technology isn’t enough. That technology has to solve problems, do so for a lot of people, and have very little friction when it is used.
Tripathi also spent four years at Intel – first with the Client Computing Group and later with the chip maker’s Accelerated Computing Systems and Graphics (AXG) business, working on CPUs and GPUs for HPC and AI workloads. While at Intel, he realized that the industry was trying to run these new workloads on infrastructure based on designs that were three decades old.
“The workload experiences, the importance of what needs to be done there are completely different,” Tripathi tells The Next Platform. “Even though Nvidia is the leader in the current state-of-the-art, the challenge is that Nvidia is still building computers the same way Intel designed or architected 30 years. No matter how great the performance is on one GPU – or now eight GPUs or 72 GPUs – it is still based on the fundamental architecture of a whole standard device, and the data moment is still struggling to actually be met. . . . That’s not an ideal architecture. If, let’s say, we didn’t know anything about computers and the workload showed up at our door, I would imagine we would come up with a brand-new architecture plan that won’t look anything like what we do today.”
Given the compute, scalability, and other needs, the cloud is natural home for most AI workloads. That’s reflected in the ongoing growth of the cloud AI market, which is expected to jump from almost $60.4 billion last year to $397.8 billion by 2030.
Tripathi set out to create a new style of infrastructure to make it easier for organizations and their developers to build and train AI workloads. He joined up with Dali Kilani, who also comes with a stint at Nvidia and software roles with companies like Zynga, Boston Consulting Group, and most recently Lifen, a French company that developed an operating system for infrastructure in the healthcare field.
Last year, Tripathy and Kilani co-founded FlexAI, a startup that came out of stealth in April armed with $30 million in seed money and plans to build an on-demand cloud service for AI workloads that addresses challenges such as accessing the needed compute and the widening skills gap. With the cloud service, Tripathi wants to match the efficiency, speed, and scalability that current cloud providers deliver to enterprises for mainstream workloads.
For nearly two decades now, cloud computing has freed organizations from having to think too much about the systems running their workloads. They can spin up instances on Amazon Web Services, Microsoft Azure, Google Cloud Platform, Oracle Cloud Infrastructure, or any other of the big and small cloud environments and they run the applications without worrying about the chip architecture or scaling or much of anything else.
The onset of generative AI brought on by OpenAI’s introduction almost two years ago of ChatGPT changed the game for developers, who have rapidly shifted to building and running AI workloads and large language models (LLMs). These jobs demand massive amounts of compute power and the number of them will only grow. Supplying the bulk of the compute power is Nvidia GPUs and demand for its compute engines has been outpacing supply for years.
Established datacenter players like Intel, AMD, AWS, and Google are building AI chips to fill the growing gap in compute power to run these workloads, and myriad startups – such as SambaNova, Cerebras, and Graphcore – also are trying to gain traction in the space.
“Intel, Amazon, Google, and others have been building solutions over the last many years trying to actually compete with Nvidia,” says Tripathi, who is FlexAI’s chief executive officer. “Unfortunately, they all have run into this problem of being hard to use. It comes with a package of complexity. While the rich guys have all the access to GPUs that Nvidia is building, they’re all sold out. They all get access to compute, but that’s a very small portion of the society or the industry. A whole bunch of enterprises are still scouring the market for access to GPUs.”
Spreading The Wealth
Tripathi and Kilani, the startup’s chief technology officer, are creating a cloud-based infrastructure service through a layer of software intelligence in the compute stack that will make it easy for developers to leverage different GPU and hardware configurations without having to make changes to their code, offering an environment that delivers the same ease, efficiency, and lower costs for AI workloads that developers have now with traditional cloud infrastructure. They won’t have to worry about what GPUs, how many are needed, or the systems they run on.
“What we are doing is we’re actually creating an orchestration layer that seamlessly allows anyone to use any of these architectures without having to change a single line of code,” Tripathi says. “The moment you describe the workload that needs to run, our orchestration layer will figure out which is the right architecture for this workload. It will find the right capacity for it, run on it, and give you the results back without you having to create the infrastructure, build anything, maintain anything, update anything. It just seamlessly works.”
It also will allow more companies to take advantage of generative AI. Right now, availability to the emerging technology is fairly limited to big companies that have the money to buy all the Nvidia GPUs they need and to do fundamental research on foundational models. Delivering the FlexAI infrastructure as a cloud service or as a datacenter-in-a-box will make it more accessible to more companies, he says.
FlexAI is basically an aggregator, renting hardware from traditional cloud providers and allocating resources to meet developer needs. Through this, organizations can get compute power on demand rather than having to rent the hardware themselves and paying by the hour, saving them money. They also have access to the necessary resources based on such needs as speed, cost, and output, and what they receive can by dynamically adjusted if necessary.
“That’s what we are building,” he says. “It gets customers onboarded onto Nvidia, AMD, and Amazon hardware. We continue to scale that cloud service and then it will enable building our own architecture. We are building the cloud service not because we are in the rental business or datacenter building business. We want to solve real computer problems, so the goal of our service is to make sure that when we build our computer system, it doesn’t have to go find the next customer to buy it and put it in a datacenter. We will have our own service to launch it.”
The Paris-based company is running its infrastructure now with a select number of beta customers. The public beta is set to launch in the first quarter.
Interesting objective! I wonder if FlexAI’s approach relies on MLIR (Multi-Level Intermediate Representation — to support heterogeneous accelerators, through LLVM — linked in the Mojo article: https://www.nextplatform.com/2023/09/08/is-mojo-the-fortran-for-ai-programming-or-more/ ): MLIR “enables machine learning models to be consistently represented and executed on any type of hardware”. FlexAI’s “orchestration layer” might then direct workloads through the backends corresponding to the most appropriate of available hardwares(?) — with some extra secret sauce of course!