ARM Servers: Qualcomm Maps Out Datacenter Battle Plan

Breaking into the datacenter with a new chip architecture is probably more difficult than getting by the security in a modern glass house and literally breaking into it, either physically or digitally over the wire. But that is what Intel did starting more than two decades ago, and it is precisely what upstart ARM server chip vendor, Qualcomm, and its several peers from the ARM arena, are seeking to do.

The hyperscalers are a funny lot. They are fiercely proud of their infrastructure and their prowess in disrupting the traditional IT supply chain to suit their purposes, and they want to brag. Frankly, it is fun to get some insight into what they are doing when a hyperscaler like Google, Microsoft, Facebook, Amazon, Baidu, Alibaba, Tencent, or China Mobile does lift the veil a bit. But at the same time, the hyperscalers are super-secretive about their infrastructure because it affords them such competitive advantage.

The best that any chip maker hoping to break into datacenter from on high can hope for is a generic nod in the direction of their aspirations, something that both Microsoft and Facebook did back in November 2014 when mobile chip giant Qualcomm announced that it was going to be moving into ARM servers, taking on AMD, Applied Micro, Broadcom, Cavium, HiSilicon, Marvell, Phytium, and a few others who want to eat a chunk of the processing on servers, storage, and networking gear in the datacenter.

Qualcomm hosted its annual analyst day with Wall Street this week, and the rumor going around was that Google was going to endorse Qualcomm’s efforts as part of the presentation with a video snippet much as Facebook and Microsoft did when the server chip initiative was announced a little more than a year ago. Such an endorsement of ARM server chips as a concept has to be thought of separately from ARM server chips as an actual platform a company is promising to deploy, displacing the Xeon chips that dominate at hyperscalers these days.

But such displacements can and do occur. For instance, a large number of machines that Dell built for hyperscalers in the mid-2000s were based on AMD’s Opterons, although neither Dell nor AMD could talk about it much at the time. Back in April last year, Urs Hölzle, senior vice president of the Technical Infrastructure team at the search engine giant, told The Next Platform flat out that it could move from Xeon to Power processors. “People ask me if we would switch to Power, and the answer is absolutely,” Hölzle said crisply and clearly. “Even for a single generation.” He said that for a 20 percent advantage, which he called “a very large number,” could compel such a switch, and moreover, if conditions changed, Google would switch back.

qualcomm-growth-opportunities

What holds for Power holds for ARM, albeit possibly in different parts of Google’s infrastructure, and indeed for all of the hyperscalers. Of course, any switch assumes a supply chain as rich, sophisticated, and voluminous can be created for ARM or Power chips as exists for Xeons. The fact that all of the hyperscalers control their code bases and all use Linux (with the exception of Microsoft, of course) gives them lots of leeway. If anyone should be egging on the ARM suppliers, it should be Microsoft, which does not have as much leverage because its code for Azure cannot be easily ported to ARM machines. Unless Windows Server 2016 has already been ported to ARM, which we sincerely hope is the case. If we were creating the Nano Server variant of Windows Server, we would do it explicitly so it can run on Xeon, future Opteron, and certain ARM servers, and we would certainly make that fact known to Intel even if nothing is said to the outside world.

For whatever reason, probably because someone talked to the press ahead of time, Google did not give Qualcomm its video snippet to talk about the importance of having an ARM alternative in the datacenter. As we pointed out even before Hölzle told us he would absolutely consider a switch to Power, Gordon MacKean, senior director in charge of server and storage systems design at Google for eight years and at the time chairman of the OpenPower Foundation, told us that Google tests different server architectures with its system and application code stack to prevent “bit rot,” although he did not name names for those architectures. Clearly ARM and Power as well as hybrid architectures mixing processors with GPUs and FPGAs are the key options aside from a straight Xeon system, and by 2017 we will have a slew of credible 64-bit ARM chips as well as the “Zen” Opteron hybrids, which will, if the rumors are right, marry a 16-core processor with a fast GPU.

Smartphones Drive Datacenters, And Vice Versa

Qualcomm is one of the dominant suppliers of chips for smartphones, the ubiquity of which is driving datacenter growth at the hyperscalers. And just as PCs gave Intel a footing from which to build its datacenter empire, Qualcomm thinks its smartphone chippery will also give it an edge over rivals in the ARM server chip business and a means to take on Intel.

Qualcomm is a full licensee of the 32-bit ARMv7-A and 64-bit ARMv8-A architectures, and that means it can not only use cores, interconnects, and other circuit blocks designed by ARM Holdings, which owns the ARM intellectual property, but can also make custom cores of its own design, perhaps adding features such as deeper pipelines or out of order execution or simultaneous multithreading, just to name a few possibilities. (We are not saying that Qualcomm is doing this, but that it can.)

The company is no stranger to making custom ARM chips, and perhaps more importantly, it has deep experience in creating system-on-chip designs that integrate all of the components of a feature phone and then a smartphone on a single package. This is experience that Intel does not have, although the world’s largest chip maker is learning a bit with its Xeon D chip aimed at hyperscalers and created essentially as a stopgap to prevent ARM server chips from getting a foot into the glass house door. Qualcomm’s “Scorpion” family of chips launched in 2008 and sold under its Snapdragon brand were akin to ARM Holding’s Cortex-A8 and Cortex-A9 processors, and the “Krait” chips from 2012 were inspired by, but substantially different from, the Cortex-A15 from ARM Holdings. Last November, Qualcomm rolled out the Snapdragon 820, based on its “Kryo” cores, which are significant in that these are the first cores from Qualcomm that offer 64-bit processing and memory addressing. Such 64-bit capability is table stakes for servers, whose operating systems, systems software, and applications have long since risen above 32-bits.

It is not a coincidence that once the Kryo cores were implemented in the four-core Snapdragon 820 and readying for their launch in the fall last year, Qualcomm revealed its ARM-based server processor, which has not been given a code name, running in prototype servers. The Snapdragon 820 is aimed at smartphones, tablets, and other client devices, and has four Kryo cores running at up to 2.2 GHz. Plus an Adreno 530 GPU, a Hexagon 680 DSP, a 25 megapixel camera, plus an LTE modem, WiFi controller, and other controllers all in a tight little package etched in 14 nanometer FinFET processes from Samsung Electronics.

Rip all of those extra components out of the Snapdragon die, cookie cutter in a whole bunch of Kryo cores, add in a bunch of cache and interconnect bandwidth and you might end up with something that looks like the unnamed prototype Qualcomm ARM server chip, which has 24 cores and which is based on the ARMv8-A architecture. But Vinay Ravuri, vice president of product management in the Datacenter Group at Qualcomm, and formerly general manager of server products at Applied Micro, told The Next Platform last October that the server chip was not just a bunch of Kryo cores but custom cores with server-specific features.

Qualcomm did not give out any other details on the chip in terms of its clock speed, caches, memory, and other SoC components. Our guess is that this 24-core server chip is implemented in a 14 FinFET nanometer process, just like the Snapdragon 820 is, although it could be fabbed by Taiwan Semiconductor Manufacturing Corp instead of Samsung. It is a low volume product aimed at prototype systems, and while it would be nice to dual source manufacturing, the tight coupling between chip development tools and factory processes does not really allow this.

As for the “real” Qualcomm ARM server chip, Derek Aberle, Qualcomm’s president, offered a few more tidbits of information during the analyst day presentations.

“We have delivered our development platform to a number of tier ones and are in the process of getting feedback, which is flowing into the next product,” Aberle said. “We do expect to be sampling by the end of this year to our customers.”

Qualcomm has said that it will etch this unnamed ARM server chip – give it a name, people, and it doesn’t appear to be Hydra like some rumors are suggesting – in the latest FinFET process node. Both TSMC and Samsung are hoping to get 10 nanometer manufacturing processes up and running in production by the end of this year, and if Qualcomm is hoping to get a part with 48, 64, or 96 cores out the door, it no doubt wants to be on these 10 nanometer processes. (The number of cores would depend on how brawny they are, and considering the customers are hyperscalers, these cores would have to provide about the same performance as a Xeon thread to be interesting.) Our wild guess? If Qualcomm can do 24 cores in a prototype in 14 nanometer FinFET processes, then it can possibly get 32 cores or 36 cores in a 10 nanometer process with fairly brawny cores and maybe 48 cores or 64 cores if they are a bit wimpier.

Having samples of the future Qualcomm server chip at the end of this year, as Aberle said, would coincide with the 10 nanometer ramp. And, more importantly, it would put this chip at the same node and on roughly the same launch schedule as Intel’s future “Skylake” Xeon E5 v5 processors, also due in 2017.

This is the battleground where the ARM versus Xeon war will be fought.

If Qualcomm decided that taking on “Broadwell” Xeon E5 v4 chips didn’t make a lot of sense with its 24-core chip, it was probably right. Those server buying decisions were by and large already made by hyperscale customers by the time Qualcomm got its prototype out the door last fall, and frankly, many had expected the Broadwell chips, which are socket-compatible with the “Haswell” Xeon E5 v3 processors, to come out last fall as well.

If Qualcomm wants to fight it out, process node to process node, server node by server node, it has to actually go toe-to-toe with Intel. And that is indeed the plan, and why Qualcomm is so secretive about what it is up to.

The stakes are high. Server processors account for about $10 billion a year in sales, and drive a $50 billion systems market. Xeon-class chips also increasingly drive a $35 billion storage industry and a $25 billion switching industry.

qualcomm-datacenter-one

Qualcomm CEO Steve Mollenkopf did made few comments about its datacenter aspirations. The first thing we noticed is that the total addressable market that Qualcomm is chasing with its server chips is now $18 billion by 2020, up from a $15 billion estimate it had when it revealed its server plans back in November 2014. This mirrors a more aggressive market share stake that ARM Holdings has put in the ground for ARM-based server chips, with the company now projecting that the ARM collective can get 25 percent share of shipments by 2020, up from estimates of 15 percent to 20 percent from a few years back.

“The datacenter, we think is an interesting opportunity,” Mollenkopf said. “Why? Because of the growth of the hyperscale datacenter. The cloud providers are building their own datacenters, they have the ability to change the architecture, and they have asked us to put together products for them. It is in our core competency, and we know we can do it and there is a customer asking us to do it at the time the datacenter is being disrupted by growth of the cloud providers building their own.”

“We think it has both strategic and commercial value to us long term,” Aberle said in his comments on the datacenter business. “There is a disruption happening in the datacenter, and the cloud players are changing the rules. They are architecting the datacenter themselves, they are doing a lot of the software themselves, so some of the reasons why companies had a hard time breaking into this market in the past are no longer relevant – at least for the portions of the market that we believe will be the fastest growing portion of the market, which will be the cloud players. And more of the enterprise workloads are moving to the cloud, as many of you know. There is a lot of good growth that is coming into this market, and it is right in the sweet spot of where we are focusing our development.”

qualcomm-datacenter-two

This $18 billion market is something worth chasing, particularly with client devices under such pressure and upgrade cycles lengthening out as they inevitably do. Aberle said that Qualcomm was focusing its server efforts on companies in the United States and China, which are the number one and number two markets, and China is the fastest growing by far even with a slowdown in its economy.

So why does Qualcomm think it can success in selling chips for servers and other gear in the datacenter? Aberle laid it out this way: “Not only is there a large desire for more competition and more innovation in the datacenter from the companies that we talk to, but Qualcomm is really uniquely positioned to win. It is not all about one metric versus another, but offering a balanced solution across performance, power, and cost. One of the things that gives me confidence is that we have been actively involved with a number of tier one cloud players from really early on, helping us define our product and make sure it is targeted at the sweet spot of the market and their needs. We are not doing this on our own and showing up to realize it is not the right product for the market.”

The deep SoC experience and being on the leading node process technology for both client and server chips isn’t going to hurt, either. A partnership with the government in China specifically aimed at server infrastructure, which The Next Platform will talk about separately, will help boosts its server prospects, too.

Related Items

Why Are We Still Waiting For ARM Servers?

ARM Momentum Builds for European HPC Set

Will AMD’s Seattle Push ARM Servers Into The Mainstream?

Applied Micro Chases Xeons With X-Gene 3 And NUMA

ARM Servers: Throwing Down The 25 Percent Share Gauntlet

ARM Servers: Cavium Is A Contender With ThunderX

ARM Servers: What Can AMD Do, And How Fast?

Inside China’s Homegrown 64-Core ARM Big Iron Chip

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.

Subscribe now

2 Comments

  1. I give Qualcomm a far higher chance to suceed than AMD. Simply because Qualcomm has the financial resource to stay in the game and compete against Intel. While AMD constantly is on the edge of going bust. Also Qualcomm needs new markets as mobile is saturating quickly and IoT there is little bang for the buck to be made.

  2. “While AMD constantly is on the edge of going bust.”

    For how many years has AMD been rumored to go bust, and has it happened YET! So what about AMD’s K12 sharing the same style execution engine as its x86 Zen counterpart(?). Will AMD’s K12 be the first custom ARM micro-architecture with SMT capabilities or will K12 be just another run of the mill ARM based SKU! AMD has its Interposer technology and is expected to replace its reference ARM core design A57 “Seattle” based SKUs with something more wider order superscalar with its custom K12 ARM cores!

    It appears that AMD’s business financials take precedence to all discussion about any of AMD’s new technology, including AMD’s exascale proposal for an powerful APU on an interposer replete with HBM and a fat GPU accelerator, all wired up with some very wide Channels/Busses CPU to GPU! We are talking about HPC systems on an interposer and their future, and AMD even has a patent application for adding some FPGA compute directly to the HBM stack/s in-between the bottom logic die and the HBM memory stacks above, for some in HBM memory compute, and all we will hear about is AMD’s poor financials.

    I’m hoping that with the Custom ARM SKUs coming to the server room that we may be able to see just how many CPU core execution resources these custom ARM cores have! Things like the number of instruction decoders per core, and the reorder buffer size, as well as the total numbers of execution pipelines(FP/INT/etc). Apple had a fairly wide order superscalar design with its A7(cyclone), and 6 instruction decoders is pretty wide, twice as wide as any of Arm Holdings’ reference cores, with the A7 having a reorder buffer the same size as Intel’s Haswell core i series, among other execution resources that made Cyclone more like a desktop SKU than a phone/tablet SKU! There is very little specific information relating to Apple’s A8, or A9 SKUs, owing to the lack of competent technology reporters outside of a pay-wall!

    The technology discussion when AMD is mentioned invariably changes to one of business financials, giving an unwelcome twist to the discussion, but Qualcomm has how much history designing CPU cores relative to AMD, or others, and hopefully the discussion will move to focusing on the relevant CPU technology, and not simply the business financials! Here is to expecting some more thorough analysis of custom ARM CPU core technology without all the digressions! The custom ARM RISC designs have every bit as much potential performance as the Power8 RISC design, depending on how the chief CPU architect and his team designs things, and that includes the custom ARMv8A running APUs from AMD, and the custom ARMv8A running SOCs from Qualcomm, and others! Let’s talk instruction decoders, FPUs, Integer units, SIMD units, other units, reorder buffers, Caching algorithms, and interposers with CPU cores more directly wired to GPU accelerators on an interposer package and sharing HBM.

    It’s the custom ARM cores that will make the most of the server room, and not so much the ARM holdings reference designs that are used mostly in phones and such, with very little of the transitional reference design ARM holdings cores that are currently in some of the server SKUs, those reference designs will be very quickly supplanted in the server room by the custom ARM SKUs that are only making use of the ARMv8A ISA!

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.