Arm Upstart Ampere Starts Chipping Away At Intel Xeons

When this is all done, Intel might have wished it had kept Renee James as president and chief executive officer, because Ampere, an Arm server chip startup that James has been running since this spring, wants a big piece of the Xeon datacenter business and it has the financial backing to start a price war that others can win and only Intel can lose.

For the past several years, we have watched as Intel’s hegemony in the datacenter grew, and while there is nothing inherently wrong with Intel selling the vast majority of processors that run most of the world’s computing workloads – monopolies are not illegal, bad behavior because of a monopoly is. The vast monoculture of the X86 instruction set, the ever-advancing sophistication of the Xeon processor line, and two decades of price/performance improvements that no proprietary or Unix system vendor could keep pace with has done great things for general purpose computing in the late 20th century and the early 21st century. No question about it. But at some point, when the gross margins consistently stay close to 50 percent, as is the case with Data Center Group, and the server market share is so utterly dominated by Intel that most of the profit pools in server hardware flow to it, a capitalistic economy commands and demands that substitutes be created and sold for less money.

Ampere is backed by giant hedge fund The Carlyle Group, which manages $170 billion in assets, and that money is how Ampere it bought the X Gene Arm server processor business from MACOM in the fall last year, which acquired that business from Applied Micro last year after the chip maker had run out of financial oomph to pursue the ambitious Arm server goals it set out in 2011.

Applied Micro did the hard work of getting the first server-class Armv8 architecture done and promoting this as a possible option to a skeptical market. The initial “Storm” X-Gene 1 chip was implemented in the 40 nanometer processes from Taiwan Semiconductor Manufacturing Corp, and offered eight custom Armv8 cores running at 2.4 GHz, with out of order execution units and other server features. The X-Gene 2 was supposed to scale up to 16 cores with Ethernet RoCE networking interfaces, but it only made it into the field with eight cores running at 2.8 GHz despite the shift to 28 nanometer processors. Despite the difficulties, Applied Micro got 25,000 machines in the field using X-Gene 1 and X-Gene 2, and started work on the “Skylark” X-Gene 3 chip, which was a much more ambitious design and, incidentally, which was not completed when MACOM bought that chip business from Applied Micro or when the company that would become Ampere was funded by Carlyle Group.

We went into the backgrounds of the techies who work at Ampere when we discussed its launch back in February and we are not going to cover that ground again. But it suffices to say that Ampere has quickly put together a team of techies, including veterans with architecture chops from both Intel and AMD as well as a lot of the team from Applied Micro, to steer the architecture. Importantly, Ampere has hired Matthew Taylor, who for many years managed the relationships that Intel had with Cisco Systems – importantly as the Unified Computing System blade servers that made Cisco into a server vendor were coming to market and experienced their explosive growth – and with Amazon during its explosive growth years between 2011 and 2015, when the Intel business at the retailing and cloud giant grew by a factor of 7.5X to $1.5 billion. More recently, Taylor was the vice president of sales and business development at Qualcomm, helping to spearhead its move into Arm server chips and negotiate the deals that funded the development of the “Amberwing” Centriq 2400 processors. (Qualcomm has effectively left the Arm server field, but it could return.)

Taylor tells The Next Platform that Ampere now has 400 employees, and that a significant number of them are engineers who worked on finishing up the Skylark X-Gene 3 design and who are now working on three follow-on products because, as we all know, no one is going to invest in any processor manufacturer that does not have a solid roadmap that goes out at least a few years. Taylor is not at liberty to say how much money Carlyle Group has pumped into Ampere, but says “it was a big number to sustain that roadmap.”

By delaying the delivery of the Skylark X-Gene 3 chip to now, Ampere was able to shift production to a more refined 16 nanometer process from TSMC. Ampere has not made available any block diagrams of the Skylark chip, but this is an old one that was put together by Applied Micro:

The Skylark chip was put on the public X-Gene processor roadmap in November 2015 and was revealed in more detail in October 2016. Applied Micro was shifting from 28 nanometer to 16 nanometer FinFET processes from TSMC with the jump from X-Gene 2 to X-Gene 3, which allowed for a substantial increase in core counts and a slight increase in clock speeds. The chip also included twice as many memory channels, at eight per socket, which doubled the memory capacity up to 1 TB of DDR4 per socket as well as doubling up the bandwidth. The X-Gene 3 chip had 32 cores plus a traditional L2 and L3 cache hierarchy; it also had integrated SATA I/O ports and 42 lanes of PCI-Express 3.0 peripheral bandwidth across eight controllers.

The X-Gene chips are going to be sold under the eMag brand, according to Taylor, and the Skylark generation will be able to turbo boost up to 3.3 GHz across those 32 cores. They offer a 33 percent memory bandwidth and capacity benefit over the “Skylake” Xeon SP processors from Intel thanks to having eight memory controllers compared to six for the Xeon SPs – an advantage that IBM’s Power9, AMD’s Epyc 7000s, and Cavium’s ThunderX2s all have, too. The Skylark chip is only available for single-socket servers, which is a big difference compared to the chips mentioned above

It is significant to us that between February when the Skylark X-Gene 3 chip was being polished for tape out and now when the chip is shipping to initial customers, the first thing that Ampere did was cut the price on the top-end device with 32 cores running at 3.3 GHz turbo, all in a 125 watt thermal envelope. (We don’t know the base clock speed yet, but it is probably a lot lower than 3.3 GHz.) Back in February, when we talked to Ampere, the top bin part with all 32 cores fired up to 3.3 GHz was expected to cost $950, and that has now dropped to $850 – a 10.5 percent cut in list price right off the bat. Ampere has also revealed that the 16-core Skylark chip with the same clock speed (whatever base and 3.3 GHz turbo) would cost $550. (We presume these are prices for 1,000 unit trays from the vendor, not the single unit prices you might see from OEMs, ODMs, and electronics distributors.)

Taylor says that using his back of the envelope math, the top bin 32-core eMag chip will stack up well against the Xeon SP and Xeon D lines. To be precise, Taylor says that this chip should be compared to the Xeon SP-6130 Gold, which has 16 cores running at 3.10 GHz with a turbo frequency of 3.7 GHz when only a few cores are lit up with work; this Intel chip comes in at 125 watts and costs $1,894. Ampere thinks it has twice the bang for the buck, since the Skylark has about the same raw performance and the same thermals. If you take the Skylake Xeon D chip aimed at hyperscalers and storage cluster builders, then the obvious comparison, says Taylor, is to the Xeon D-2191, which has 16 cores running at 1.6 GHz (without HyperThreading) and fits in an 86 watt thermal envelope for a $2,407. With a much lower price and more cores, the bang for the buck is going to be three times better with the eMag chip compared to the top end Xeon D.

“We think we have a pretty compelling story with performance per dollar and performance per watt,” Taylor. “This is why customers are evaluating or using the product.”

Server maker Lenovo, which was initially skeptical about the use of Arm servers in the datacenter but which has embraced Cavium’s ThunderX2 in protoype systems, is getting ready to ship 1U and 2U rack servers based on the Skylark eMag chip, and later this year, in November or December, Ampere will be shipping its own whitebox servers based on the processor for customers who don’t have Lenovo on their preferred vendor list. Oracle is also talking up Ampere in the announcement, but has make no commitment to making servers based on the chip even if it is hinting that it may port its Oracle Linux and maybe applications and databases to Arm architecture by saying nice things about Ampere. Oracle did disclose to Wall Street that it has a 20 percent stake in Ampere, which cost it $46 million, and that means Ampere raised $230 million.

At the moment, Ampere is being pretty vague about its product roadmap, and even the feeds and speeds of the first eMag chip and its SKUs because competing against Intel is a harrowing business, as all of the people at Ampere know full well. But Taylor dropped some hint.

“We now have four products on our roadmap, which is key,” Taylor explains. “The next generation eMag will come pretty quickly, but I am not going to give you the date. This chip will be on TSMC’s 7 nanometer processes, which is pretty big jump from a process node perspective. There will be a big jump in all aspects of the product: more cores, more memory, more I/O. We will also bring in two socket capabilities, and to be candid, we have a view that one socket systems have a lot of advantages, especially in the way that we have done it with a monolithic die. I think AMD has done a great job in educating the market on the value of one socket servers, but there are some penalties associated with the Epyc architecture and having lots of cores with access to local memories that you don’t have to hop out to for access is a big advantage. That being said, there are customers who either have religion that two-socket servers are the only way to go, so we will continue to offer that. We have our opinions, but we also listen to what our customers want, and some of them want two socket machines.”

Beyond that, Ampere is planning to offer a wider SKU set as the eMag 1 and eMag 2 chips come out, with different performance, price, and thermal points. And Taylor promises that there will be a rapid cadence beyond that. When pressed, Taylor deferred on specifics, but said the updates would come a lot quicker than the three year cadence that Intel is on with new architectures for its Xeon chips for sure. New manufacturing processes as well as new technologies (think DDR5 memory or PCI-Express 4.0 and 5.0 peripherals or protocols like Gen-Z and CCIX, just to name a few) will drive that Ampere roadmap, as will features developed by Ampere itself to differentiate its processors. We later found this roadmap that clears it up a little:

The one thing that would be fun to see happen is a big jump in cores and memory controllers, just to keep the heat on Intel with compute capacity and memory capacity and bandwidth. The fact is, many applications are bound by memory capacity and bandwidth, not raw compute. But adding more memory controllers to the die is a problem, which is why IBM was big on creating buffered memory for its Power8 and Power9 chips and only reluctantly added proper DDR4 controllers to the die. It would be fun to see 48 cores and a dozen memory controllers with eMag 2, boosting both memory and compute in lockstep, and maybe 64 cores and 16 controllers with e Mag 3, raising them both by 33 percent, and it would be even more fun for one of these future eMag chips would employ buffered DDR5 memory using SERDES links to the compute complex and remote DDR controllers in the buffer, as IBM is talking about for future Power chips.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.

Subscribe now

2 Comments

  1. You might want to interview Intel employees about whether any of them wish Renee James was still employed at the company before leading with that hypothesis.

  2. Timothy, could you expand on the comment:
    “The fact is, many applications are bound by memory capacity and bandwidth, not raw compute.”
    It appears to be a central point to your thesis of the value-added differentiation of non-Intel CPUs.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.