When the “Aldebaran” datacenter GPUs were launched by AMD last November for the HPC and AI crowd pushing up into the exascale stratosphere, only the two top-end models of the Instinct GPU accelerators –ones that use the Open Accelerator Module (OAM) form factor put forth by Facebook and Microsoft under the Open Compute Project – were actually available. A plain vanilla PCI-Express variant of the Instinct 200 series was promised, and AMD has decided that just before Nvidia’s GPU Technical Conference kicks off today is a great time to introduce the Instinct MI210.
AMD didn’t say much about what this MI210 might look like, and we conjectured that it would be a slightly geared down dual-GPU compute complex as was used in the higher end MI250 and MI250X accelerators, but with a PCI-Express 4.0 interface instead of the OAM form factor. And at the same time we suggested that it would be a good thing for AMD to cut the “Aldebaran” in half and create a more modestly powered GPU for systems that wanted cheaper GPU compute on the PCI-Express bus rather than more dense compute as we expected the MI210, MI250, and MI250X to all offer.
As it turns out, AMD is doing the latter and ignoring the former, so we guessed wrong. But our logic still holds, and we now say, having delivered an Instinct MI210 GPU accelerator that is essentially half of an MI250, but with a PCI-Express interface instead of the OAM interface, perhaps AMD should think about taking some versions of the dual-GPU complex that do not have all of their compute or memory working and create a higher-end PCI-Express device for customers who want denser GPU compute and still to use a PCI-Express interconnect to link the GPUs to the CPUs in their hosts.
AMD was not in a mood to talk about the Instinct MI210 and just lobbed the briefing deck and press releases out there to the IT press, like raw meat thrown in the lion’s den, which IT vendors do from time to time when they want us to comment on something but paradoxically do not want to make a lot of noise. Here is what the Instinct MI210 looks like:
Here is how the actual Instinct MI210 GPU accelerator stacks up against the prior “Vega” and “Arcturus” GPU accelerators from AMD, as well as the proposed feeds and speeds for the hypothetical addition to the MI200 lineup. The fake accelerator is in red bold italics:
The MI210 accelerator is precisely half of an MI250 accelerator, with half of the two-chiplet Aldebaran compute complex and therefore half the performance, half the HBM2e memory, half the memory bandwidth, and presumably not more than half the cost. (We are guessing a little less than half, in fact.)
The pricing shown above is out best guess of what the cost of these GPU accelerators are, but it is just a guess, and in supply chain constrained environments when there is also high demand for GPU compute, pricing can be truly crazy. Certain gaming GPUs are selling for 1.5X to 2X their suggested retail price, and for all we know, the same thing is happening with datacenter GPU compute engines. (No one talks about it, but there is a lot of whispering.)
Just for fun, and knowing that Nvidia is widely expected to launch the “Hopper” GH100 GPUs at the GTC event this week, we are updating this comparison between the AMD MI200 series accelerators and the Nvidia A100 series accelerators. Here are the feeds and speeds, including the hypothetical MI220 accelerator that we think would slide in nicely into the AMD product line:
In many ways, the AMD “Aldebaran” MI200 accelerators leapfrogged the performance of the Nvidia “Ampere” A100 accelerators but probably were sold at around the same price performance. We are reckoning the pricing that should prevail for these accelerators based on actual costs of Nvidia A100s in the SXM4 form factor that prevailed at the beginning of the coronavirus pandemic. Heaven only knows what supply and demand is doing for price, and the pricing we show in the table below for these accelerators is more of a relative gauge than an absolute one. Take a look:
We strongly suspect that all of these GPUs are in short supply, and that pricing even on prior generation gear is going to hold up pretty well until supplies loosen on the newer equipment. And in a lot of cases, the choice of GPU will come down to what is available when.
Here is the benchmark data that AMD is showing off comparing its PCI-Express 4.0 MI210 GPU accelerator with 64 GB of HBM2e memory to the Nvidia PCI-Express 4.0 A100 with 40 GB of HBM2e memory. Even the half Aldebaran machine can clearly hold its own against this A100 on various HPC codes:
The average performance advantage running the HPC benchmark tests and actual workloads shown above works out to 1.7X.
All of these comparisons will need to be updated before the day is out, or certainly before whenever the Nvidia Hopper GH100 GPUs actually enter the field. We will be doing the math as soon as we have any data to do so.
Interestingly, the “Lumi” supercomputer system being installed at the CSC datacenter in Kajaani, Finland under the auspices of the EuroHPC effort, and which we wrote about in detail back in October 2020, has a primary compute partition that will weigh in at 550 petaflops that will be powered by the Instinct MI210 accelerators – not Nvidia A100s or H100s (the rumored name of the impending Hopper-based GPU from Nvidia), and not with the MI250 or MI250X, either. Lumi is not some lightweight acquisition by CSC, either, with a price tag of $237 million, either. So not every HPC or AI machine is going to always be built with the top-end GPU accelerator. Sometimes it is not even possible, but sometimes, given the nature of the work and the budget, it is not desirable, either.
The Instinct MI210 accelerators are available today. That could turn out to be their main virtue in the short run.
Sign up to our Newsletter
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Given the semiconductor shortage and what are likely high yields of the 1700 MHz bin, it seems unlikely to me that many 1700 MHz capable parts will be downclocked to populate a PCIe card. There just aren’t enough semiconductors that the performance versus form-factor trade-off is good for anyone.
This also explains why using half the silicon at the full clockspeed was preferable for the MI210 just announced.
I wasn’t so much as suggesting they should be downclocked but that for yield reasons, take the ones that can’t run at 1.7 GHz and find a bunch that work. Looking at this again, I would say also find the ones that have fewer active CUs and less memory yield. I agree with you precisely–let no piece of silicon go wasted!