Amping Up The Arm Server Roadmap

Competition in and of itself does not directly drive innovation – customer needs that might be met by some other product is really what makes suppliers hop to and get the lead out. No matter what you do in this world, there is always a chance that someone else will do it better and quicker – or both.

The nascent Arm server chip is littered with companies that that attempted to break into the server space and compete against the hegemony of the X86 architecture, or those who thought they might take a run at it and then, either early on or just before announcing products, decided against it.

Breathe in.

Calxeda launched in 2011 and then famously flamed out for complicated reasons – it didn’t have 64-bit processors and it could not force the hardware stack into datacenters before the software stack was ready. It also ran out of money trying to do tech support for partners. Nvidia launched its Project Denver Arm server effort and then quietly killed it off. Samsung never did make its plans known, and killed them off before admitting anything. AMD jumped in with the K12 Arm server project and its low-end “Seattle” APUs to try to save face in systems, but then pulled back to concentrate on the Epyc X86 server chips – without a doubt the right thing to do. If the world wants or needs high volume Arm servers at some future date, AMD will be able to create one fairly quickly by global replacing Epyc cores with Arm cores – which is basically what the K12 project was about anyway. Qualcomm was gung-ho with its Centriq line, and then decided after putting a prototype and a production chip into the field that this was not going to work out well financially and spiked it. Phytium made a fuss three years ago with its “Mars” Arm server chip, and was never heard from again, probably because Huawei Technology’s HiSilicon Kunpeng 920 looks to be the Arm choice for China. Broadcom put together the very good “Vulcan” Arm server chip and mapped out a plan to take on Intel in the datacenter, and then in the middle of trying to buy Qualcomm, decided to jettison the Vulcan effort, which Cavium picked up and created its variant of the ThunderX2 chip. In the interim, Marvell bought Cavium and also picked up the Arm server design team from Qualcomm, so Marvell has benefited a few times over from the failures of others, particularly given that it is really the only vendor of Arm server chips that has anything close to production installations. (We just did an in-depth review of the ThunderX roadmap here.) Fujitsu has done a fine job with the HPC-centric A64FX processor, aimed at traditional supercomputing as well as AI workloads, which we have covered at length. And Amazon Web Services has cooked up its own Graviton family of Arm server chips, which it is putting up for sale by the hour on its EC2 compute service. The Graviton chips have the potential to be a higher volume product than the ThunderX line if they take off on the AWS cloud.

Breathe out.

That leaves one more credible maker of Arm server processors, and that is Ampere Computing, the company that created out of the ashes of the Applied Micro X-Gene Arm server chip business nearly two years ago, which is notable because it has Renee James, former president of Intel, as its chief executive officer as well as a slew of former Intel chip people on staff – and equity backing from The Carlyle Group to boot. Jeff Wittich, senior vice president of products at Ampere, had a chat with The Next Platform about what is coming next for the company as it builds out its roadmap and tries to being Arm servers into the datacenter among the hyperscaler and public cloud elite.

Wittich is no stranger to these customers, which is why Ampere tapped him for his role in June. Wittich got is bachelor’s in electrical engineering at the University Notre Dame and then went on to get a masters in electrical engineering at the University of California Santa Barbara, where he also worked for two years as a graduate student researcher before joining Intel. He was a process engineer working on etching equipment for a year, and then a senior device engineer in the foundries to work on the 45 nanometer Hi-K metal gate processes that debuted in Intel Xeon server chips in the late 2000s. After that for a five year stint, Wittich was a product reliability engineer for Intel’s 22 nanometer products, and in 2014, became senior director of cloud business and platform strategy at the chip giant. Until joining Ampere, that Xeon chip business grew by 6X in five years – significantly higher than the company had expected.

Suffice it to say, James and Wittich know these hyperscaler and cloud builder customers intimately. And that is perhaps more dangerous to Intel than an instruction set and a clever arrangement of transistors on a wafer of silicon.

Here was our starting thesis in the conversation. If you looked at the past eight to ten years from outside the IT sector and you didn’t know much about it, you might think that somebody was intentionally benefiting from the end of Denard scaling and the slowdown in Moore’s Law advances in transistors. All of the CPU vendors started stumbling around, in a bit of a daze and not getting important work done on time, and this is coincident with the rise of the hyperscalers and cloud builders and Intel being able to maintain 50 percent gross margins with its datacenter products because, even with increasing theoretical competition, those alternative Arm chip suppliers from days gone by could not deliver the right chip at high volume at a predictable cadence. It is one thing to make a few hundred or thousand samples; it is quite another to build a few tens of thousands or hundreds of thousands per quarter.

“I completely agree,” Wittich tells The Next Platform. “That’s one thing that I think it’s really important, the fact that our whole executive team at Ampere, we’ve all done this before. So, 500,000 units doesn’t sound like much at all to me. I did that for 15 years, and our head architect and our head of engineering, they’ve all done this for ten or more generations of high volume, server-class CPUs. I think we know what it takes to deliver at scale and at high volume. That’s why I think we are particularly well suited to succeed in this space.”

Take a look at the executive team at Ampere and you will see that Wittich is not kidding. These are very seasoned people from Intel. And they all believe that the time is right to create that alternative, and that an Arm architecture is the way to go.

“If somebody can come in and establish that they have a reliable cadence of product delivery, can meet the volume requirements, can get through a qual cycle in an efficient and reliable manner, provide the customer support that the hyperscalers expect, then there’s a big opportunity there,” says Wittich. “We are not just trying to go and compete on matching and exceeding the exact same performance metrics or TCO metrics that the broad datacenter market has looked at for the last decade. We are specifically delivering the type of performance that you need in a multitenant cloud, with the type of performance consistency, with the type of security that you need. So it goes beyond just basic performance and basic TCO. It’s also about the type of power efficiency that hyperscalers need. It’s the type of scalability they’re looking for, and it’s that foundation of a cloud architected features that provide quality of service, manageability, and security. There’s an opportunity to come in and reshape the landscape by doing something that’s truly different and truly innovative right now.”

One could argue that the eMAG 1 chip, which was based on the “Skylark” X-Gene 3 chip created by Applied Micro with some tweaks by Ampere, was not that product, although it was a perfectly respectable server chip. The 32-core Skylark chip started shipping in volume in September 2018, stacked up pretty well against the 16-core “Skylake” Xeon SP processors from Intel, providing the same number of threads (when Intel turned on HyperThreading). All Ampere chips use real cores without threads to scale up compute, and this is a conscious choice as it simplifies the pipelines some and, moreover, some workloads do worse rather than better with simultaneous hyperthreading turned on. And finally, adding threads provides another way that security can be compromised because threads means virtualizing and sharing resources like registers and L1 caches, and virtual CPUs (vCPUs) on the public clouds usually have a thread (not a core) as their finest level of granularity. This is, Wittich says, inherently less secure.

With the next generation Arm server processor – which is code-named “Quicksilver” but which will not be called the eMAG 2, by the way, and the official brand has not been unveiled as yet – Ampere will be scaling up and out on a bunch of different vectors. And it is still focusing on multitenant cloud and edge use cases and is not really pursuing legacy enterprise platforms or the HPC sector as other Arm suppliers are trying to do. Ampere will be using the “Ares” core created by Arm Holdings for its Neoverse N1 platform, and it has modifications that are being done by Ampere to optimize performance and to make use of Ampere’s own mesh interconnect for on-die communication. The next-generation Ampere chip will scale up to 80 cores on a monolithic die, and will be etched in the 7 nanometer processes created by fab partner Taiwan Semiconductor Manufacturing Corp. The 32-core eMAG 1 did alright for a first stab at the market by Ampere, but Wittich says that it did not have enough single core performance, and this will be fixed in the next generation chip.

The Ampere Quicksilver chip will have eight memory channels, just like the eMAG 1 did, and Wittich says it will have as more memory bandwidth as the eMAG 1 provided, and that further it is getting its DRAM memory controllers from a third party as many Arm server chip makers do. The Quicksilver chip will be supporting the CCIX interface for linking to accelerators like GPUs, and will support two-socket NUMA configurations as well as single-socket implementations. CCIX will be the transport for these NUMA links. This stands to reason since Applied Micro did not really have a hardware-based NUMA technology of its own and was resorting to software-based NUMA over PCI-Express for the X-Gene 2. (Other Arm chip makers are using CCIX for NUMA links.) The future CPU will also have PCI-Express 4.0 peripheral controllers, but the number of lanes is not yet clear.

The clock speeds on that 7 nanometer Quicksilver chip have not been revealed, but it is hard to imagine that even with the process shrink from 16 nanometers for the eMAG 1 that Ampere can maintain a sustained turbo speed of 3.3 GHz on the cores which boosting the core count to 80. What we do know is that, owing to its edge, hyperscaler, and cloud target markets, the Quicksilver chip will have a much wider compute and thermal range than the Skylark chip had. The Skylark chip had SKUs that ranged from 75 watts to 125 watts, but Quicksilver will range from a low of 45 watts all the way up to 200 watts or more. That implies SKUs ranging from maybe 10 cores all the way up to 80 cores, depending on how much juice the uncore region of the Quicksilver chip burns.

The Quicksilver chip came back from the foundry this week and samples will be shipped out to key partners before the end of the year, according to Wittich. The plan is to ramp up volumes towards the middle of 2020.

As you can see, Ampere has two more chips on the roadmap that it is showing publicly, with a 7 nanometer follow-on in development now and a 5 nanometer kicker to that in definition stages now. This hews more or less to the Arm Holding Neoverse roadmap, and we expect for Ampere to stay more or less in synch with that, picking and choosing technologies as it sees fit from Arm – and as it has done with Quicksilver. That implies a more or less annual cadence of chip rollouts.

2010 through 2014 having audited all ARM server design developers except AMCC in its Ampere reincarnation history shows the typical patterns of innovation looking for a problem to solve.

Calxeda with its Energycore attempted and did achieve a 32 bit low power toy work group server lacking Xeon ump and Xeon ump at Xeon low frequency low power. Low power v Xeon and high throughput whatever the frequency remains a competitive point of entry today. What will fill Xeon LP underserved market?

Marvell started with a 24 bit slate CPU, Armada, ending with sleds no one wanted to code albeit working prototypes this was a serious study in platform development.

AMCC XGene demonstrated a serious effort on funding devoid of opportunities associated with finding product voids in Intel product SKU and price structure the exit strategy always and well before Ampere positioned the technology for sale to Intel.

Cavium Thunder ahead of AMD was the first to realize product voids existed above Intel core counts albeit single threaded where low power became a secondary consideration in relation core frequency, platform and network power demands.

Samsung never left audit phase attempting to determine if there was truly an AMD server business.

Qualcomm Centriq like AMCC failed to find product and price voids within and around Intel product and price structure and Broadcom on the success of its switch business including the massive volumes of Xeon v3/v4 dumping made it an easy decision to sell Vulcan, to Marvell, unknown whether a serious server player however thought a serious switch and network play. This era of ARM 64 development was unable to compete on any platform metric, CPU cost : price especially. Intel freebie CPU in bundle, economies of Xeon board production, price of system memory made any ARM price and / or power cost savings to the commercial user irrelevant.

Marvell having picked up the pieces of Thunder/XPliant and Vulcan on core network tradition appears primarily switch or software defined network acceleration play.

Graviton and/or Neo are depicted here at Slide 2;

https://seekingalpha.com/instablog/5030701-mike-bruzzone/5224004-amd-in-relation-to-intel-q3-2018-channel-inventory-holding-report

Now Ampere workstation. ARM remains a custom solution for those developing their own software stack. Internal development keeps the software advantage off the cloud but Linux developers that are not concerned with prying eyes might give it a try.

Xeon D at gen 3 has no traction and gen 2 barely sold. ARM opportunities are low power edge in and custom implementations that are more switch like and network related then server centric all around the coincident edges of transport structure.

Mike Bruzzone, Camp Marketing

ProDigit says:

December 14, 2019 at 8:59 am

Arm server chips are a dead end.
Especially if we’ll see Nvidia AI cards in the near future, replacing GPUs for ai workloads and small computations.
Yes sure, Intel server chips are overpriced, but I think where most businesses will head to, is GPU/AI card servers.
They need a high frequency CPU, with about as many CPU cores as it has GPUs (or other data processing boards).
The time of CPU crunching through tables and searching for parameters is fading away. GPUs are the future.

Dileep Bhandarkar says:

December 14, 2019 at 2:17 pm

There are several things in this article that are difficult to believe. Do not drink the KoolAid!

First, if the chip just came in last week, it is unlikely that it is being sampled at customers already.

Second, reaching volume production in mid 2020 is not even a dream. The chip will probably not be qualified for at least a year. Volume production in mid 2021 may be possible. How many spins will it go through before final production. FAB turnaround time is likely to be about 4 months.

A 80 core design in 7nm is probably around 500 sq mm and 200W (just my guess). It will be hard to scale it down to 45. All of the yield can be harvested down to about 50 cores, so there is not benefit to going down to 10 watts.

Mike Bruzzone says:

January 1, 2020 at 1:47 pm

2010 through 2014 having audited all ARM server design developers except AMCC in its Ampere reincarnation history shows the typical patterns of innovation looking for a problem to solve.

Calxeda with its Energycore attempted and did achieve a 32 bit low power toy work group server lacking Xeon ump and Xeon ump at Xeon low frequency low power. Low power v Xeon and high throughput whatever the frequency remains a competitive point of entry today. What will fill Xeon LP underserved market?

Marvell started with a 24 bit slate CPU, Armada, ending with sleds no one wanted to code albeit working prototypes this was a serious study in platform development.

AMCC XGene demonstrated a serious effort on funding devoid of opportunities associated with finding product voids in Intel product SKU and price structure the exit strategy always and well before Ampere positioned the technology for sale to Intel.

Cavium Thunder ahead of AMD was the first to realize product voids existed above Intel core counts albeit single threaded where low power became a secondary consideration in relation core frequency, platform and network power demands.

Samsung never left audit phase attempting to determine if there was truly an AMD server business.

Qualcomm Centriq like AMCC failed to find product and price voids within and around Intel product and price structure and Broadcom on the success of its switch business including the massive volumes of Xeon v3/v4 dumping made it an easy decision to sell Vulcan, to Marvell, unknown whether a serious server player however thought a serious switch and network play. This era of ARM 64 development was unable to compete on any platform metric, CPU cost : price especially. Intel freebie CPU in bundle, economies of Xeon board production, price of system memory made any ARM price and / or power cost savings to the commercial user irrelevant.

Marvell having picked up the pieces of Thunder/XPliant and Vulcan on core network tradition appears primarily switch or software defined network acceleration play.

Graviton and/or Neo are depicted here at Slide 2;

https://seekingalpha.com/instablog/5030701-mike-bruzzone/5224004-amd-in-relation-to-intel-q3-2018-channel-inventory-holding-report

Now Ampere workstation. ARM remains a custom solution for those developing their own software stack. Internal development keeps the software advantage off the cloud but Linux developers that are not concerned with prying eyes might give it a try.

Xeon D at gen 3 has no traction and gen 2 barely sold. ARM opportunities are low power edge in and custom implementations that are more switch like and network related then server centric all around the coincident edges of transport structure.

Mike Bruzzone, Camp Marketing

Amping Up The Arm Server Roadmap

Sign up to our Newsletter

3 Comments

Leave a Reply Cancel reply

Sign up to our Newsletter

Related Articles

Ampere Reveals “Quicksilver” Altra Lineup, 128-Core “Mystique” Kicker

Nvidia Rounds Out “Ampere” Lineup With Two New Accelerators

Taking A Superhybrid Approach To HPC/AI Convergence

3 Comments

Leave a Reply Cancel reply