AI Means Re-Architecting The Datacenter Network

SPONSORED FEATURE: There are a lot of pressures on the datacenter network these days just to deal with expanding data and the increasing use of microservices architectures for modern applications. But the advent of artificial intelligence – both for neural model training and for inference that drives applications – has really thrown a monkey wrench into the works.

Every problem, of course, is also an opportunity, and to take the pulse of datacenter networking and how AI is forcing system and network architects to rethink what they are doing for this very different workload, we sat down with Achyut Shah, senior vice president and general manager of the Connectivity Business Unit at semiconductor maker Marvell.

The compute engines that drive AI workloads – almost always GPUs but sometimes custom ASICs – are exceedingly fast, but keeping them fed with data so they can chew on it is a big problem.

“If you look at a CPU from days of old and compare it to a GPU or an accelerator of today, the pure processing power of this is at least an order of magnitude more – at least 10X, 20X, or 30X more than a basic CPU,” says Shah. “So by definition, it needs to take in data a lot faster to process that amount of data, and then also push it out a lot faster, also. Not only are you seeing each GPU block consuming a lot of bandwidth, but you are seeing a lot more of these together on a board and in a cluster when you talk about these large language models that are running. The clusters that are running these workloads have hundreds, sometimes thousands, and maybe tens of thousands of GPUs. And that is only going to keep increasing in the future.”

Clearly, that is going to mean that a whole slew of technologies, from the ASICs in the network switches to the host adapters in the servers to the transceivers in the network cables, are all going to have to be upgraded and operating in synch.

Marvell is one of the few vendors on Earth that can do it all when it comes to datacenter networking, and Shah laid out, from top to bottom, how Marvell can address the low latency and high bandwidth needs that AI workloads require – and whose requirements will probably grow at an exponential pace for many more years to come.

This content was sponsored by Marvell.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

5 Comments

  1. Fast-paced nonstop interview (the 30-minutes just flew by) wow! Two meters to 200km, AI bandwidth, network bottlenecks, latency, Teralynx, NRZ, PAM4, leaf-spine, Linear Direct Drive, DSPs, Co-Packaged Optics, costs, and convenience … lotsa ground covered!

    Not a bad idea to review the TNP “1.6T Ethernet” piece in preparation for viewing this interview I think: https://www.nextplatform.com/2023/03/07/setting-the-stage-for-1-6t-ethernet-and-driving-800g-now/

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.