AMD is definitely on a roll in the United States for pre-exascale and future exascale systems, having won the deals at Lawrence Berkeley National Lab with the “Perlmutter” system, at Oak Ridge National laboratory with the “Frontier” system, and as the presumed front runner at Lawrence Livermore National Laboratory with the “El Capitan” system.
But as we have pointed out before, AMD has broader aspirations in the HPC market, and just like it put out a customized part tuned for HPC in the “Naples” 7001 generation of processors, the company has just released a custom part aimed at the HPC crowd in the new “Rome” Epyc 7002 generation.
This new processor, dubbed the Epyc 7H12, is debuting as AMD is hosting its “Rome in Rome” event in Italy, where the company and its partners are talking up the latest Epyc family, which is definitely giving Intel Xeon SPs a run for the money. Significantly, Dell EMC is also trotting out the Rome Epycs in five brand new systems, all of which have been tuned up specifically for Rome. Dell is not, as it could have, just dropping Rome chips into its three existing Naples systems, but tweaking them to support the full Rome stack at top speeds and thermals and adding in support for PCI-Express 4.0 peripherals.
Adding support for PCI-Express 4.0 is key for a number of reasons. First, when you think about it, PCI-Express 4.0 is necessary to balance out the doubling of the cores in the Rome chips compared to equivalently positioned chips in the Naples lineup – at least in workloads that are going to lean heavily on networks or NVM-Express storage. PCI-Express 4.0 delivers 32 GT/sec of bandwidth per lane, each way on a dual simplex channel, which works out to 15.75 GB/sec for an x8 slot and 31.5 GB/sec for an x16 slot. PCI-Express 3.0 did half that. Second, Intel doesn’t have support for PCI-Express 4.0 peripherals in the Xeon SP line, and won’t have it until next year’s “Ice Lake” Xeon SP etched with 10 nanometer processes. So that’s an advantage that AMD has over Intel, and one that Rome Epyc system makers want to exploit.
In the prior Naples generation, the special HPC part, quietly launched in November 2018, was the Epyc 7371, which had half of the cores on the Naples die activated. The fastest 16-core Naples part before this was the Epyc 7351, which had 16 cores running at 2.4 GHz with a turbo boost to 2.9 GHz. The chip had 64 MB of L3 cache and burned at 170 watts, all for $1,100 a pop when bought from AMD in 1,000-unit trays. To give HPC customers running traditional simulation and modeling workloads as well as AI workloads some extra oomph on the CPU, AMD jacked the clock speeds on this 16-core chip by 33.3 percent to 3.2 GHz and the boost clock speed by 24.1 percent to 3.6 GHz to create the Epyc 7371. The thermals went up by 17.7 percent to 200 watts (if you compare peak to peak) and the cost rose by 40.9 percent to $1,550. Getting more oomph in a socket always increases the price a little faster than the performance, so this is not a surprise.
With the special Rome HPC chip, AMD is going all the way by goosing the full 64-core variant of the Rome compute complex. The plain vanilla Epyc 7742 has 64 cores spinning at 2.25 GHz with a boost speed of 3.4 GHz; the chip has 256 MB of L3 cache, burns 225 watts, and costs $6,950 each when bought in tray lots. The new Epyc 7H12 HPC variant has 64 cores that are jacked up by 15.6 percent to 2.6 GHz on the base frequency, with the boost frequency actually decline a smidgen to 3.3 GHz. The thermal design point rises by 15.6 percent to 280 watts and at this point requires liquid cooling like Atos is doing with its Bull/Sequana XH2000 supercomputers. The XH2410 blade for the Sequana machines, which we previewed last November, has three two-socket nodes in a 1U tray and up to 32 of them can be crammed into a single cabinet along with interconnects for various supercomputer topologies and protocols. Here is what it looks like:
That means Atos/Bull can put 12,288 cores in a rack. Atos tested the Rome Epyc 7H12 chip in a dual-socket node and was able to get 4.2 teraflops running the High Performance Linpack test that is used to rank supercomputers, and said further that this chip delivered 11 percent better Linpack performance. If the cost of the Epyc 7H12 chip scaled linearly based on its peak theoretical performance compared to the Epyc 7742, then the Epyc 7H12 would cost $8,031 a pop. If it scaled with delivered Linpack performance, it would cost $7,715. But as we said, every incremental gain in performance costs more than that linear scale. With the same delta as the difference between the plain vanilla and HPC part in the Naples line, the Rome HPC part would cost $9,793 a pop. Our guess is that list price is around $9,000 each for the Epyc 7H12.
Not all HPC customers are looking to maximize throughout, but rather are more focused on getting the best single threaded performance out of a machine that they can. This is certainly the case with a special breed of HPC in the financial services sector, according to Scott Aylor, general manager of the Datacenter Solutions Group at AMD.
“Financial services does have an HPC affinity,” explained Aylor in a pre-briefing ahead of the Rome in Rome event. “With the financial services firms, this is where we see good momentum on the low core count parts. When you look at the clock speeds we are able to achieve as well as the cache sizes we are putting into our eight and sixteen core parts, we are seeing tremendous interest with high frequency, latency sensitive applications.”
To that end, Aylor flashed up a comparative chart showing the integer performance, as gauged by the SPECrate2017_int_base benchmark, of eight-core and sixteen-core Rome chips compared to their Xeon alternatives. Here is how the eight-core chips stacked up:
And this is how the sixteen core chips fell where they may:
Based on raw integer throughput and street prices, AMD has anything from a decent to a crazy big price/performance advantage. It was not obvious in the charts above what the spread was against all of those Xeon SP processors, so we took the data and built this handy dandy little table:
The relative performance per dollar in the right hand column are not for each processor shown, but rather how much better the one AMD Rome chip was better than the Xeons in the comparisons that AMD cooked up. The Rome advantage ranged from 1.8X to 4.7X with the eight-core chips and from 1.3X to 4.4X on the sixteen-core variants. The average works out to 3X better bang for the buck.
Getting The Big OEMs On The Road To Rome
Back at the Rome launch event in August, some of the big server makers were not yet talking about their new machines based on the second generation Epyc chips, and the biggest elephant in the room was Dell and what its plans were. This week, Dell rolled out five new machines, and as the world’s biggest OEM, accounting for 19 percent of the $20 billion in sales and 17.8 percent of the 2.69 million units shipped in the second quarter, what the company does – and does not – do speaks volumes about what enterprise customers think is important in servers. (The hyperscalers and cloud builders in the United States, with the exception of Microsoft, largely go straight to ODMs for their gear, which is largely made in Taiwan, and interestingly, Alibaba and Tencent go to OEMs like Inspur and Lenovo for a lot of their gear as well as to the ODMs.) Dell is calling the five new PowerEdge machines “the bedrock of the modern datacenter,” which is some of the most enthusiastic language we gave seen coming out of Dell’s server unit in recent years.
“We absolutely see demand for AMD’s Rome Epyc at a higher clip than what we saw with Naples,” Ravi Pendekanti, senior vice president of product management for servers and infrastructure systems at Dell, tells The Next Platform, adding that they were concerned during the Naples generation about AMD’s long-term commitment to the Epyc chips as well as Dell’s commitment to building servers based on them. “We have a lot more customers asking us for sample units and we actually have customers that are starting to procure systems.” In fact, Dell prebriefed customers six months earlier than it normally does with the Rome launch, starting with over 130 customers in non-disclosure briefings back at SC18 in November last year, because the pull was so strong.
Here is an overview of the five new machines:
The thing that is remarkable about these Dell machines, really, is that they look like PowerEdge machines and while they have slightly different designs from the Xeon-based PowerEdge boxes, there is enough breadth in this line of five (which you can see here) that a lot of use cases in software-defined storage, HPC and AI, data analytics, and virtualization and virtual desktop infrastructure – places where the high core count, high memory bandwidth, and high I/O of the Rome Epycs really have an advantage. There is a mix of 1U and 2U form factors offering one or two sockets in a NUMA setup and the PowerEdge C6525 has four two-socket compute sleds in a 2U form factor with two dozen media bays (up to two NVM-Express drives and four SAS/SATA drives per node) and supporting the hottest 225 watt Rome parts and the fastest 3.2 GHz DDR4 memory sticks. Having 512 cores in a 2U chassis is a real thing, and it is something that Intel absolutely cannot do.
All of the Dell Rome machines support Windows Server, Red Hat Enterprise Linux, SUSE Linux Enterprise Server, and Canonical Ubuntu Server LTS as well as Citrix Systems XenServer (now called simply Hypervisor), VMware ESXi, Microsoft Hyper-V, and KVM from the Linux distros as hypervisors. This is what is normal and expected.
And one more thing: Hyperconverged storage, says Pendekanti, absolutely requires PCI-Express 4.0 slots at this point to balance out the system. There may not be a lot of PCI-Express 4.0 cards out there, but they will ramp quickly now. Frankly, card makers have been waiting for servers, and while we respect IBM’s Power Systems lineup, which has supported PCI-Express 4.0 for almost a year now, these machines are not volume products like AMD Rome boxes will certainly be.
The two PowerEdge single-socket servers – the R6515 and the R7515 – are available now. The dense PowerEdge C6525 system will be available in the middle of October along with the PowerEdge R6525 rack server, and the PowerEdge R7525 will be in limited availability by the end of the year with general availability in the first quarter of 2020.
Supermicro, which spans a gray area between an OEM and an ODM, is also pushing its Rome machines, including five H12 systems that are available now and three more Ultra platforms that are coming out in the fourth quarter:
At some point, we will drill down into server platforms based on “Cascade Lake” Xeon SP and Rome Epyc chips and look at the feeds and speeds and slots and watts, and oomph and bucks and have a bit of a throwdown. But not until some of these machines are shipping and pricing information is available.
At the Rome in Rome event, AMD was also bragging that fab partner Taiwan Semiconductor Manufacturing Corp would be adding Rome systems to its EDA clusters to design and test chip manufacturing and packaging processes, and French hoster OVH and IBM Cloud have also committed to adding Epycs to their cloudy infrastructure. Officially, amongst the Super 7 hyperscalers and cloud builders, Amazon Web Services, Microsoft Azure, and Google Cloud Platform have all said publicly that they are deploying AMD Rome chips, and we suspect that Alibaba, Baidu, and Tencent in China have Epyc processors – although they may be the homegrown Naples variant, not the Rome variety as yet. Oracle is also deploying AMD Epyc systems on its eponymous cloud.
Facebook, the only hyperscaler without a cloud business, has been mum on the Epyc subject. Facebook was an early and enthusiastic user of AMD’s Opteron processors more than a decade ago, and it will not pass up a good deal if AMD can beat whatever price Intel is probably cutting to in order to keep the Facebook account on Xeons.