Artificial intelligence has taken the datacenter by storm, and it is forcing companies to rethink the balance between compute, storage, and networking. Or more precisely, it has thrown the balance of these three as the datacenter has evolved to know it completely out of whack. It is as if all of a sudden, all demand curves have gone hyper-exponential.
We wanted to get a sense of how AI is driving network architectures, and had a chat about this with Noam Mizrahi, corporate chief technology officer at chip maker Marvell. Mizrahi got his start as a verification engineer at Marvell and, excepting a one year stint at Intel in 2013 working on product definition and strategy for future CPUs, has spent his entire career as a chip designer at Marvell, starting with CPU interfaces on various PowerPC and MIPS controllers, eventually becoming an architect for the controller lione and then the chief architect for its ArmadaXP Arm-based system on chip designs. Mizrahi was named a Technology Fellow in 2017 and a Senior Fellow and CTO for the entire company in 2020, literally as the coronavirus pandemic was shutting the world down.
To give a sense of the scale of what we are talking about, the GPT 4 generative AI platform was trained by Microsoft and OpenAI on a cluster of 10,000 Nvidia “Ampere” A100 GPUs and 2,500 CPUs, and the word on the street is that GPT 5 will be trained on a cluster of 25,000 “Hopper” H100 GPUs – with probably 3,125 CPUs on their host processors and with the GPUs offering on the order of 3X more compute at FP16 precision and 6X more if you cut the resolution of the data down to FP8 precision. That is a factor of 15X effective performance increase between GPT 4 and GPT 5.
This setup is absolutely on par with the largest exascale supercomputers being built in the United States, Europe, and China.
While Nvidia uses high speed NVLink ports on the GPUs and NVSwitch memory switch chips to tightly couple eight Ampere or Hopper GPUs together on HGX system boards, and has even created a leaf/spine NVSwitch network that can cross connect up to 256 GPUs into a single system image, scaling up that GPU memory interconnect by two orders of magnitude is not yet practical. And, we assume, the scale needs are going to be even larger as the GPT parameters and token counts all keep growing to better train the large language model.
The physical size of current and future GPU clusters and their low latency demands means figuring out how to do optical interconnects. So, will Marvell try to create something like the “Apollo” optical switches that are at the heart of the TPUv4 clusters made by Google? Does it have other means to do something not quite so dramatic and still yield the kinds of results that will be needed for AI training? And how does the need for disaggregated and composable infrastructure fit into this as a possible side benefit of a shift to optical switching and interconnects. And where does the CXL protocol fit into all of this?
Find out by watching the interview above.
Sign up to our Newsletter
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.