Getting peak utilization out of GPU farms in the age of AI will be the unending quest. From being able to partition a GPU into small fractions to scaling across many GPU-dense nodes, Nvidia has already provided tools to make this possible. However, one startup thinks it has the key to making orchestration and virtualization of GPUs more nuanced, efficient, and manageable.
Run.AI, which launched almost three years ago, just scored a $30 million Series B round of financing from a range of Israeli VCs, including TLV Partners and S-Capital to keep building its vision of Kubernetes and Docker-fed GPU virtualization.
As the company’s CEO and co-founder, Omri Geller tells us, while it’s true Nvidia has vGPU and other features that can provide more utilization control, they are ultimately coming from an entirely different starting point—one that has less flexibility for users, especially those with mixed workloads that can change daily or who want to use a virtualization approach that isn’t just focused on providing fractions of a GPU at a time.
“We can fractionalize GPUs without a need for virtual machines, which helps for performance, especially for the HPC and AI applications that don’t want the overhead. Further, the way we built was with flexible, dynamic allocation in mind,” Geller says. “You can actually change the fractions of the GPUs you use for multiple workloads running concurrently whereas with vGPU you need to slice it in advance and decide how you want to ‘fractionalize’ the GPU then the next day you could have a bunch of GPUs waiting for work to run.” In essence, Run.AI treats all GPUs as one pool and can dynamically allocate.
On the vGPU point it is worth noting that Run.AI does not use a hypervisor, it is entirely built on Kubernetes for orchestration and Docker for compute. Since Docker has been shy on outright GPU support over the years, Geller says this was the perfect opportunity with a ready and waiting market. “Run.AI is part of the container ecosystem so we built an OS level virtualization for GPUs that doesn’t require a hypervisor. The overhead we have is just from the container itself; we’re not adding more than that, so when you compare containers to VMs the same difference remains true.”
He adds that many of their early customers won’t accept a VM environment and want a flexible way to consume resources and definitely don’t want to define in advance how they’re going to slice and dice their infrastructure before they even know what will be running.
It’s not just Nvidia in the GPU game, of course. Geller says they are ready for all the accelerator architectures that will come along, including AMD’s rising GPU force, Intel, and all the many chip startups. That sounds like quite a bit of work ahead, especially on the custom hardware startup side but Geller insists all the complexity will be on their development end. He says that for users well-accustomed to working with Kubernetes and Docker the experience will be seamless with no code changes necessary.
It’s probably only a matter of time before Nvidia provides this capability. They have been in front of virtualization and partitioning over the years beyond vGPU, including with Nvidia Virtual Compute Server and other features. What will keep Run.AI ahead of those moves will be trusting in a base familiar enough with Docker and Kubernetes to not feel the pain of management—and of keeping their prices lower than vGPU, of course.
In terms of pricing, Geller was cagey. This was one of those “it depends” answers to direct questions about pricing but he says they have an internal calculator that they use in conversations with customers curious about how it might pay off for their workloads.
The company is off to a good start with greater funding in hand to continue expanding. They say their customer base is primarily in automotive, finance, defense, manufacturing, and healthcare and those users have seen a “25 to 75 percent GPU utilization increase on average” with one customer’s experiment providing a speed increase of 3000 percent.”
Dr. M. Jorge Cardoso, Associate Professor & Senior Lecturer in AI at King’s College London, uses Run:AI in the London Medical Imaging & Artificial Intelligence Centre for Value-Based Healthcare (AI Centre). “With Run:AI we’ve seen great improvements in speed of experimentation and GPU hardware utilization. Reducing time to results ensures we can ask and answer more critical questions about people’s health and lives,” said Dr. Cardoso. “The AI Centre is on a journey to change how healthcare is provided and Run:AI empowers us on this journey.”
“Tomorrow’s industry leaders will be those companies that master both hardware and software within the AI data centers of the future,” said Omri Geller, co-founder and CEO of Run:AI. “The more experiments a team runs, and the faster it runs them, the sooner a company can bring AI solutions to market. Every time GPU resources are sitting idle, that’s an experiment that another team member could have been running, or a prediction that could have be made, slowing down critical AI initiatives.”