For most of the history of high performance computing, a supercomputer was a freestanding, isolated machine that was designed to run some simulation or model and the only link it needed to the outside world was a relatively small one to show some visualization.
With the GenAI version of artificial intelligence, the whole point is to constantly take in data from the outside world and to constantly output recommendations or actions to that outside world in response. And that means that just as AI clusters will need fat, fast, and lossless networks to not waste time – and therefore money – on those very expensive AI servers crammed with GPUs and perhaps other kinds of AI accelerators, the front end networks that link AI systems to the outside world are probably going to need an upgrade from the typical 10 Gb/sec speeds still common in the enterprise.
And that is why companies like Arista Networks are doubly excited about GenAI. Literally.
In a call going over its financial results for the third quarter of 2024, the top brass at Arista said that they have added another “cloud titan” to the list of four prior hyperscaler and cloud builder customers who were trialing or piloting its AI-centric switching for their next generation AI clusters, which will span 50,000 to 100,000 GPUs in a single cluster today and higher in the near future. And Arista said further that another fifteen enterprise customers, are building AI clusters with much smaller numbers of GPUs, are also testing out its wares as they progress towards pilots and production systems. (Our guess is enterprises are deploying thousands of GPUs up to maybe as high as 10,000, if they can get the GPU allocations from Nvidia or AMD and the budget from their chief financial officer.)
So it looks like the AI networking pipeline is building, and Nvidia’s InfiniBand, which has largely dominated AI clusters until now, is getting some real competition from Arista as well as its own Spectrum-X line. This has taken a long time, particularly when measured in IT years. But frankly Ethernet was not going enough to drive up utilization on those GPUs, and it has had to improve in terms of RDMA and congestion control to take on InfiniBand where it is strong and then leverage the much higher scale that Ethernet offers over InfiniBand for any given number of tiers in a back end network.
The ratio of AI front end upgrade for AI back end – what Arista chief executive officer Jayshree Ullal calls the “AI center” – is unclear because this is a new phenomenon. And it will be harder to figure it out over time, too. (Just like it is hard now to say what is “cloud” and what is not.) The ratio for the 2025 forecast for 2025 is $750 million in AI backend networking, $750 million in front end AI-related networking, and an additional and unrelated $750 million in campus networking.
“What we are starting to see more and more is for every dollar spent in the back end, you could spend 30 percent more, 100 percent more – and we have even seen a 200 percent more scenario,” Ullal explained on the call. “Which is why $750 million will carry over to, we believe next year, another $750 million on front-end traffic that will include AI, but it will include other things as well, it won’t be unique to AI. So I wouldn’t be surprised if that number is anywhere between 30 percent and 200 percent, so the average is around 100 percent, which is 2X back end over front end. So we are feeling pretty good about that. We don’t know how to exactly count that as pure AI, which is why I qualify it by saying increasingly that if you start having inference, training, front-end storage, WAN, classic cloud all come together, the pure AI number becomes difficult to track.”
But that is next year. In the September quarter, Arista is still working to capture some AI money. It now has deals underway at five of the five big AI customers it has been chasing – it was four out of five last quarter. We don’t know much about what these customers are doing. We do know that one of them is Meta Platforms, which is building two clusters, one interconnected with Nvidia InfiniBand and the other with Arista Ethernet. Three of these customers are “going well,” according to Ullal; one is just starting, and the new fifth customer “is moving slower than we expected” and is “awaiting new GPUs and they have got some challenges on power, cooling, et cetera,” as she put it.
In the quarter, Arista’s product revenues rose by 18.5 percent to $1.53 billion and services revenues rose by 28.2 percent to $287.1 million.
Software subscriptions within products drove $20.7 million in sales down 30.2 percent year on year, and that pushed down software and services growth combined to $307.9 million, up only 21.4 percent. We are not sure what is up here.
Add it up, overall revenues rose by 20 percent to $1.81 billion in the quarter, which was an increase of 7.1 percent sequentially. (The guidance for the quarter was for revenues to be in the $1.72 billion to $1.75 billion range.) Operating income was $785 million, up 30.2 percent, which shows prudent cost controls as well as better margins as well as selling higher-end products, and net income rose even faster, by 37.2 percent, to $748 million. Net income was a very healthy 41.3 percent of revenues. That is by far a record high level if you do not count some tax benefits that pushed Arista up to a 47.2 percent level back in Q4 2019. This is not accounting, but real profits. The fact that some research and development costs earmarked for Q3 have been pushed into Q4 helped a little.
Arista ended the third quarter with $7.43 billion in cash and equivalents, up 66.7 percent. Customer purchase commitments were up 15 percent to $2.3 billion and deferred revenue was $2.57 billion.
Looking ahead, Arista says that Q4 revenues should be in the range of $1.85 billion to $1.9 billion. That is an 18 percent growth for the year when it was projecting 10 percent to 12 percent growth.
For 2025, as the mix of networking shifts to cloud and AI customers, the revenues are expected to grow 15 percent to 17 percent, but gross margins could lose anywhere from three to five points as annual revenues cross over into more than $8 billion.
As for the transition to 400 Gb/sec and 800 Gb/sec interconnects, Ullal said that most of the AI trials are for 400 Gb/sec products because customers are waiting for network interface cards and Ultra Ethernet features like packet spraying for the move to 800 Gb/sec networks.
“While we are in some early trials on 800 Gb/sec, the majority of them are 400 Gb/sec, and the majority of 2024 is 400 Gb/sec,” Ullal said. “I expect as we go into 2025, we will see a better split between 400 Gb/sec and 800 Gb/sec.”
Thank God It’s Philosophical Friday, at TNP! And what better way to celebrate than by exploring the network metaphors of the mice flows and the elephant flows, with associated conundrum beats, that are the locusts of the Ultra-Ethernet vs Infiniband plague, and struggle!
Indeed, mice flows alone can be readily handled by the jungle of cables and switches that forms a nurturing networking environment and ecosystem, but the appearance of elephants commonly leads to wanton trampling of their light-duty nesting abilities. In the Congolese fable then, the mice flows retort by congesting elephant trunks until the large beasts slow to a creepy crawl, starve, and pass away … an approach that unfortunately satisfies only the mice. The Indian fable is more of a win-win though, with elephants agreeing to spray their herd packets around mice flow nests (rather than crush ’em), and mice flows eventually helping to decongest elephant flows, by cutting the large nets in which malicious hunters of computational performance parables had cunningly trapped them.
The beautiful moral of this allegoric rendering of the philosophical underpinnings of Ultra-Ethernet couldn’t be any clearer: A network friend in need, is a network friend indeed. Might this fundamental principle guide the adoption of 800 Gb/sec interconnects! 8^p