As the last several years have shown, scaling up AI systems to train larger models with more parameters across more data is a very expensive proposition, and one that has made Nvidia fabulously rich.
But putting AI into production in enterprises, whether they are hyperscalers or regular enterprises, is quite possibly going to be more expensive, particularly as we move away from batch systems and move up to human-machine interactions with GenAI systems and all the way up to machine-machine, or agentic, AI inference.
The biggest bottlenecks in AI systems – compute, memory, and interconnect – are holding back both performance and profitability. These challenges are becoming increasingly apparent as we push the boundaries of AI capabilities.
Estimates from a simulator built by Ayar Labs suggests that the next generation of the GPT foundation model from OpenAI will include 32 different models with a total of 14 trillion parameters. No expected configuration of future iron from Nvidia-based “Rubin” GPU accelerators and improved versions of its existing copper-based NVSwitch interconnects will be able to sufficiently lower the cost of AI inference for this platform while also moving the interactivity of the inference to speeds that are suitable for agentic AI.
This is obviously a problem. If GenAI is to take hold, then something has got to give. And that something is very likely going to be electrical interconnections between AI accelerators and quite possibly even between those accelerators and their HBM stacked memory.
But how should AI accelerator architectures evolve to increase the performance of AI clusters while at the same time boosting their performance to levels that make agentic AI economically – and therefore technically – feasible?
This is a good question, and we will get some answers from experts who are wrestling with that question right now in a live webinar on October 24 from 9 am to 10 am Pacific Daylight Time. Our panelists include:
- Nidhi Chappell, vice president and general manager of Azure AI infrastructure at Microsoft
- Jean-Philippe Fricker, chief system architect at Cerebras Systems
- Robert Hormuth, corporate vice president of Architecture and Strategy for the Data Center Solutions Group at AMD
- Vladimir Stojanovic, chief technology officer and co-founder at Ayar Labs
Join moderator Timothy Prickett Morgan from The Next Platform along with these industry experts to discover the strategies that will shape the future of AI hardware, turning massive investments into lasting profitability. Register here.
Be the first to comment