We have often opined that ARMv8 processors would struggle to meet Intel Xeon chips head-on until they got a few microarchitecture revisions under their belts to improve per-core performance and until they narrowed the manufacturing gap to 14 nanometers or 16 nanometers, or perhaps even 10 nanometers.
But it looks like ARM server chip maker Applied Micro is aiming to do just that with its X-Gene 3 chip, which we profiled last November when its architecture was announced. Applied Micro has reached for this lofty goal before, with its X-Gene 1 and X-Gene 2 processors, but it appears that their its third generation of custom core system on chip processors will deliver performance that is “well within range of Xeon E5 products,” according to a recently released report by the Linley Group, publisher of The Microprocessor Report.
This announcement tees up two important topics. First, will X-Gene 3 really deliver Xeon-class performance? And second, does it matter? That is, do ARM server chip makers really need to chase Intel’s Xeons, or is there a market for less powerful Atom-class SoCs in the datacenter?
Going Head-To-Head With Xeon?
Implemented in the 16 nanometer finFET processes from Taiwant Semiconductor Manufacturing Corp, the X-Gene 3 will sport 32 single-threaded, quad-issue cores, eight DDR4 memory controllers (yes, eight), and will run at speeds up to 3 GHz. If Applied Micro can deliver on the 3 GHz promise, this certainly looks to be a contender in the fight for Xeon dollars, at least for the midrange of the Xeon line (call it in the range of ~600 SPECint_rate performance) even if Applied Micro misses the clock speed target by a couple hundred megahertz. The decision to dedicate die space to the extra memory controllers is particularly welcome news, since many applications see “stranded cores” caused by insufficient memory bandwidth. From our friends over at Linley Group, you can see that X-Gene 3 will have both per-socket and per-thread (but not per-core) performance that is in the right zip code:
Figure 1. Comparison of server-processor performance. X-Gene 3 delivers better per-thread performance than any other ARM server processor and matches the newest Xeon E5 products in per-thread and total performance. *SPECint_rate2006 (base) for GCC; all ICC scores reduced by 15%. † at maximum thread count. (Source: The Linley Group)
From my experience, you need to be close to 20 SPECint_rate per core to be a player in general purpose cloud compute infrastructure at this time, so this looks pretty good. But to temper your excitement just a tad, look closely at the data and note that they are comparing a 14-core Xeon, with two threads on each core, to a 32-core single threaded X-Gene 3. Not all apps can run or run well in multi-threaded mode, and efficiency of multi-threading can vary, so a core-to-core comparison would be more meaningful for those applications and would significantly favor Xeon. Also, X-Gene 3 will come out in late 2017, assuming the schedule holds firm. Intel is likely to be shipping the “Skylake” Xeon E5 v5 chips in this same timeframe, which is a major uplift from the current generation, and will raise the ante yet again with more cores, faster cores, the OmniPath interconnect, perhaps PCI-Express 4.0 and more memory controllers than the “Broadwell” Xeon E5 v4 chips. Finally, these are simulations only, and are based on Applied Micro reaching 3 GHz for the X-Gene3, which is not an easy task for a 32-core SoC. These caveats notwithstanding, Applied Micro has certainly put Intel on notice that ARM can and will come after the heart of its profitable datacenter monopoly. And ARM chips do not need to match Intel’s top-bin parts to have an impact.
Is More And Faster Always Better?
These nits may not matter as much in the future as they appear to today. While public clouds require excellent per-core and per-socket performance to be able to handle the wide variety of workloads run by their enterprise customers, high-end processors are overkill for many of the fastest growing workloads. There may be a large opportunity for smaller, more efficient processors that are “right-sized” to match the needs of the IoT and other segments.
A few examples may help illustrate the point. Facebook’s massive server farms do not require absolute screaming performance, and depend instead on lower-clocked and lower-cored Xeons, and more recently the new Xeon D in the “Yosemite” microserver, which can deliver sufficient single-socket performance and enough DRAM at lower prices and lower power envelopes. (See our recent coverage from the Open Compute Summit on Facebook’s server configurations.)
Similarly, content delivery networks do not require brawny cores, and this market will experience hyper-growth as virtual reality games begin shipping in earnest and as video continues to displace old fashioned photos in advertising and communications. Content deliver networks need good I/O and networking bandwidth, but they don’t require a lot of computational performance to pick up data off a disk drive and put it on the (right) virtual wire. Finally, in computationally intensive applications like deep learning, where the compute is all on a GPU or other accelerator, the CPU is such a small part of the equation that a more modest CPU like an ARM server chip could deliver better economics.
In short, we are finally gearing up for a wave of competitive ARM-based server SoCs as 14 nanometer and16 nanometer manufacturing comes online and more affordable. From Applied Micro’s X-Gene 3 to a Snappier Dragon (sorry, they still haven’t given us a name) from Qualcomm, these parts will find a welcoming market in the traditional datacenter, where Xeon performance is de rigor, and from the new datacenters that are evolving to support the fast growing workloads coming online and that value good enough performance at lower costs and power.
Karl Freund has been an executive in the server and processor business for over 35 and is a frequent speaker at technology and investment conferences. He has been an outspoken advocate for alternative computing technologies such as ARM chips and GPUs, and is the author of the armservers.com site. Freund holds a bachelor’s degree from Texas A&M University in applied mathematics and a master’s degree in computer science from the University of North Carolina at Chapel Hill.