The Mass Customization Wave Is Starting For Servers

Remember when only a couple of variations of processors were available for servers in any given generation of server CPUs? There might have been dozens of vendors, but they didn’t give a lot of choice, Today, we have a handful of server CPU designers and only a few foundries to do the etching, but the variety of compute engines is staggering.

And this is absolutely necessary, given the slowing of Moore’s Law improvements in the price/performance of transistors and the widening of the workloads that the modern server needs to support. At AMD, the person who needs to strike the balance between customer needs for customization and corporate needs to have a product line that makes sense and is profitable is Dan McNamara, who is senior vice president and general manager of the server business.

McNamara has been in the semiconductor industry his entire career, including founding one company and doing sales at another before joining FPGA maker Altera in 2004 as a director of business development. While at Altera McNamara did a stint as a sales director, and then was put in charge of application engineering and then the FPGA maker’s embedded division. When Intel acquired Altera in 2015 for $16.7 billion, creating the Programmable Solutions Group, McNamara ran it for four years ands then was tapped to be general manager of the Network and Custom Logic Group. In January 2020, McNamara joined AMD to help steer the company’s expansion into the datacenter – and while we didn’t talk about this, probably had more than a few thoughts about X86 server chip vendors buying FPGA makers.

For this conversation, we wanted to stick with what is happening with server chips and architectures, now and over the next five to ten years.

Timothy Prickett Morgan: Let’s start with the “Milan-X” Epyc server chip announced last week and its 3D V-Cache. How do we think about the prevalence of this style of Epyc server chip, and when will all chips have 3D V-Cache? I realize there is a trade-off between the difficulty of manufacturing and the performance boost, but if you are trying to maximize core real estate on a chip complex, taking the L3 caches vertical has you have done might be a good strategy even for relatively small chips.

Dan McNamara: That is the interesting question, and 3D V-Cache is part of our bigger vision of what where compute going, right? Milan-X is one point in a long roadmap driving to different optimization points. With “Naples” Epyc 7001s, our customers and us had a singular view of general purpose compute, and with “Rome” Epyc 7002s, we did regular versions and high frequency versions. And with “Milan” Epyc 7003s, we had Milan, Milan high frequency, and now Milan-X with stacked cache.

We talked a lot about this last November, that in this broader megacycle of compute, we really believe that this is the future and we believe that there are many optimization points that customers are looking for.

So when you look at Milan-X from a TCO and performance perspective, think of a customer trying to optimize for electronic design automation, computational fluid dynamics, fluid dynamics, and things like that. But we also believe that this is the this is the beginning of the future. We haven’t disclosed a lot of Epyc roadmap, but we have 3D V-Cache on the client and now on the server and customers will optimize and will do so when it provides real value.

TPM: I get that. But when I look at a processing complex and I know 3D stacking is going to be a problem because of the heat that compute cores generate, and I know that I could double stack the L3 cache and get triple capacity because the V-Cache piggybacks on the I/O of the on-die cache and is twice as dense as the on-die cache, and therefore I could pretty easily to get back some more socket area to add cores or accelerators or whatever, I think I would do that as a matter of course once the manufacturing was perfected and, presumably, more affordable. You might be able to get 20 percent to 30 percent more cores on the die just by stacking up the L3 cache. I think there’s going to be an inflection point where this technology is ubiquitous for this reason. And you might triple stack L3 to goose it even further for those technical workloads that are very cache sensitive. . . .

Dan McNamara: In theory, you are dead right, and let me expand on that a little bit. If you think about the next five years or so, it is not just going to be stacking memory and you know, on a CPU core. The socket goes completely heterogeneous. So the companies that can provide more heterogeneous components, either in a package or on a board, win in this new age.

The other thing, and I think you hit it on the head, is what is your base level in the CPU complex and then stack up appropriately from there. We have this partnership with Taiwan Semiconductor Manufacturing Co, and it is a hybrid bond technology that we worked on together, with no microbumps. And we know that we cannot leave this performance on the table, we can’t wait, we have to do something now. But we are also not trying to boil the ocean here, and we know and our OEM and ODM partners know that Milan-X is not for everything. If you want good TCO, high density VMs, regular Milan is great.

We will bifurcate and focus more going forward, as you know from our roadmap. “Genoa” has 96 cores and a step function in performance, bringing a ton of extra compute to core enterprise, public cloud, and high performance computing. With “Bergamo” we bring a completely different view, with 128 cores and optimized for cloud native workloads, with low power consumption and very good power efficiency and higher density.

TPM: The way I think about it is that the era of high volume general purpose computing where you could get ten million of a thing into the market with slight SKU variations that were more about maximizing chip yield and extracting more profit from features than anything else is over. But the era of low volume, precisely tuned hardware, where you might only make a few hundreds of thousands to a million of a particular design, is just starting. Companies might only have three, four, or maybe five or six, different server SKUs in their fleets at any given time, but the variation in the designs of servers, from inside the socket to across sockets and peripherals in the node, is going to be quite high across the top several thousand organizations in the world.

Dan McNamara: You got it.

TPM: Let’s shift gears for a second. How does the server market contrast with the market ten years ago, and what will it be like five years or even ten years from now?

Dan McNamara: I think it is exactly what we are talking about here. The future is really about different optimization points, marrying the right optimization point and the software and enabling that heterogeneous computing. If you look at high-end supercomputers today, it’s completely heterogeneous, with CPUs and GPUs and all sorts of different technology. As we go forward, there will be different optimization points outside of the CPU. We have SmartNICs and GPUs, and peer to peer connectivity. The optimizations are moving away from just being done with the CPU to being done with the entire system.

TPM: For the future I see, the definition of what is a server is going to get fuzzy, even if the definition of what is a distributed computing system is will be no fuzzier. The components will just be organized and orchestrated differently from the way we do them today.

I don’t know when that day is, but I think the server as I know it – the box with metal skins in a 1U, 2U or 4U form factor mounted in a rack – is going away. I think the new unit of compute is as far as we can stretch PCI-Express and CXL and other overlays. Maybe this unit of compute is a couple of racks, or an entire row, or maybe a few rows podded up. I don’t know. But within this, there will be trays of CPUs, GPUs, FPGAs, and custom ASICs with a little of their own memory and trays of shared DDR memory, of shared persistent memory, and of shared flash storage in the racks that comprise the storage hierarchy and different levels and layers of interconnect that lash this all together and connect it to the outside world. This is the new motherboard. The funny thing is that within the compute engine socket, the same diversity and increasing complexity that is happening to explode the server as we know it is also happening to the server CPU socket.

Dan McNamara: I think that’s spot on. You know, disaggregated computing has been talked about for so long, but there will be islands of different memory or islands of different forms of compute going forward. Definitely. And you are seeing it in the cloud today, right?

TPM: Well, we still have servers and there is still a server motherboard inside of a machine, whether or not it has a skin on it. But we are going to need boardlets or something like that to break all of these static hardware configurations into smaller, composable systems and then have very sophisticated workload management tools to keep all of this stuff running at high utilization. There should never be a component in this whirligig complex that is not being used. Let it do cloud-based protein folding in the background. . . .

The worry I have is that the I/O between all these racked-up components is going to eat us alive, but I can’t think of a better way to do it.

Dan McNamara: It is true about worrying about the I/O. The other thing you need to think about with I/O is to offload. Look at a system today, especially in the cloud: You have storage on the machine and you are wasting cycles on storage. Why do you want to do that? Get a SmartNIC and accelerate that. We are going to see more and more of that and people not wasting precious cycles on something that can be offloaded. The question is how do we streamline the I/O such that latency and bandwidth are at an optimum?

TPM: Everywhere I turn in the datacenter, I see mass customization.

Dan McNamara: Everyone is looking at reams and reams of data and is trying to figure out how do we create more intelligence and better outcomes. All of this needs compute, and the trend is for more and more compute. And I think this optimization that we are both talking about is definitely happening, and will happen a lot more going forward. It is just starting to happen for AMD, and we see a good example with Milan-X. We believe it’s going to be a huge win and we know it is not for everything.

TPM: I would not think with this first rev, 3D V-Cache can be maybe 10 percent or 20 percent of the SKUs volumes sold, but it is not going to be 50 percent.

Dan McNamara: We don’t talk about things at that level of detail, but no, it’s not going to be 50 percent. We have had to train our sales team and make sure they understand 3D V-Cache is not for every workload. And they now know what Milan and Milan-X are respectively aimed at.

TPM: What can you what can you say about where AMD is on the road to market share gains in servers? I am always looking for the day when you cross over the 25 percent server share threshold.

Dan McNamara: I can’t say much. But you know, we have obviously very high aspirations for our share, and you have followed the financials for 2021 and early 2022. And again, we just entered a quiet period . . . .

TPM: [Laughter] Of course you did! Nice job scheduling this interview.

Last question: Do you ever consider in your mass optimization future that you might need to do four-socket and eight-socket servers?

Dan McNamara: We have no plan of record publicly on four-socket or larger machines. But we always look at it, and it is an interesting piece as we grow the ecosystem. We do come across customers who need larger memory footprints, and SAP HANA is definitely where larger memory footprints and above 2P scale scale matters. But we don’t have any public plans on it right now.

TPM: Well, there is not a lot of volume but there is some profit in it. We suspect in this lower volume per SKU world, that will be the case with mass customized SKUs for CPUs, GPUs, and FPGAs, too. You will be able to make it up in lower volume at higher ASPs if the TCO works out better for the customer even with a higher cost chip. Isn’t that funny?

peter j connell says:

March 29, 2022 at 10:35 pm

Classy read. Thanks.

Michael A Bruzzone says:

March 31, 2022 at 2:04 pm

“TPM: I would not think with this first rev, 3D V-Cache can be maybe 10 percent or 20 percent of the SKUs volumes sold, but it is not going to be 50 percent.

Dan McNamara: We don’t talk about things at that level of detail, but no, it’s not going to be 50 percent. We have had to train our sales team and make sure they understand 3D V-Cache is not for every workload. And they now know what Milan and Milan-X are respectively aimed at.”

Percent of AMD Epyc and semicustom on channel data

Rome V = 0.33%
Rome K = 0.08%
Rome H = 2.27%
Rome F = 3.21%
Rome 1P = 15%
Rome 2P = 79.11%

Rome 64C = 25.23%
48C = 2.97%
32C = 32.33%
24C = 9.02%
16C = 12.54%
12C = 2.37%
8C = 15.55%

Milan F = 15.6%
Milan 1P = 16.82%
Milan 2P = 67.58%

Monitoring for ‘X’ at channel brokers since October 1, 2021 thus far = 0%
Analyst suspects X variant will be in Milan ‘F’ 15% range + 386% in relation Rome F.

Milan 64C = 25.43%
56C = 4.94%
48C = 3.21%
32C = 17.99%
28C = 4.64%
24C = 24.01%
16C = 16.51%
8C = 3.26%

Overall > 32C said by AMD when not limited by cores license = 30.89%
32/28C AMD said cost effective = 27.47%
24/16/12/8 AMD said best performance per core constrained by core license = 41.62%

Mike Bruzzone, Camp Marketing

The Mass Customization Wave Is Starting For Servers

Sign up to our Newsletter

2 Comments

Leave a Reply Cancel reply

Sign up to our Newsletter

Related Articles

University Of Stuttgart Spends €115 Million To Go Exascale

Why AMD “Genoa” Epyc Server CPUs Take The Heavyweight Title

How The “Antares” MI300 GPU Ramp Will Save AMD’s Datacenter Business

2 Comments

Leave a Reply Cancel reply