This is not the first time that Big Blue has found itself the underdog in the datacenter, and it probably will not be the last time, either. But the ubiquity of the Xeon processor in the server racks of the world presents IBM with a tough challenge as it tries to position its own Power platforms as well as those that will be coming soon from its OpenPower Foundation partners. Perhaps as early as next month, in fact.
The stakes are different this time, and one could argue that Intel and its legion of systems and storage partners are more insurmountable than any foe IBM has faced, including the Antitrust Division of the US Department of Justice back when the government was less inclined to tolerate unregulated monopolies. But by selling off its System x division to Lenovo last fall, IBM made it clear that it believes it can create an ecosystem of Power partners that can present a credible alternative to the Xeon platform in the datacenter, particularly for workloads that like lots of threads and memory bandwidth.
It is important to not underestimate any large corporation that has demonstrated longevity. Apple, for instance, had certainly run its business up on the rocks in the 1990s, barely maintaining its fanboi PC business, until it hit reset with the iPod in 2001, moved its PCs from PowerPC chips to Intel Core and Xeon processors in 2006, and followed gracefully with the iPhone in 2007 and the iPad in 2010 to create a monster company that now dwarfs IBM in terms of revenues, profits, and market capitalization.
The Past Is Prologue
IBM has a history of rebuffing waves of competition, often with the spectre of antitrust lawsuits hanging over its head, from clone mainframe makers such as Amdahl and Hitachi and proprietary minicomputer makers such as Hewlett-Packard and Digital Equipment decades ago. When the Unix revolution took over datacenters in the late 1980s and early 1990s, IBM’s RS/6000 Unix business was late and did not present much of a threat to Unix leaders Sun Microsystems and HP for the better part of a decade. It is not a coincidence that Unix took off in the wake of the 1988 recession and that IBM’s rise in Unix starting in 2001 came during the recession that followed the dot-com bust. This is an illustrative moment, so forgive us if we digress a bit before getting to some comparisons between modern Xeon E5 v3 and Power8 platforms running a mix of Spark in-memory analytics and machine learning algorithms.
IBM fielded the Power4 processor back in October 2001, which was the first RISC/Unix processor to have two cores and to break the 1 GHz clock speed barrier. It had 125 GB/sec of memory bandwidth from the chop caches into the processor cores, making it applicable not only for commercial workloads such as relational databases and application servers, but also as the CPU in supercomputer nodes. At the time, the Unix business accounted for around 45 percent of all system revenues worldwide and the proprietary minicomputer and mainframe businesses were much larger than they are today and the X86 server business, while still representing the majority of shipments, was not yet driving the majority of revenues. Our point is, there was a lot of money at stake. And IBM was able to be extremely aggressive with its Unix sales because it had very profitable iSeries (AS/400) midrange and zSeries (System/390) mainframe businesses and could offer cutthroat pricing on pSeries (RS/6000) systems to expand its footprint in the Unix business.
Just how aggressive was IBM? At the time, we were involved in consulting engagements with some major banks looking at replacing their Sun Solaris systems with alternatives, including compatible Sparc machines from Fujitsu as well as pSeries gear from IBM. In November 2001, when it came to relational databases, the safe bet was a Sun-Oracle combination, and Sun Microsystems had just gotten its UltraSparc-III processors out the door, too. A prior generation Enterprise 10000 system using 64 UltraSparc-II processors – the NUMA system design that Sun bought from Cray that gave it a datacenter play at all – could handle about 185,000 transactions per minute using 464 MHz processors at a list price of $26 per TPM. After discounts on the hardware and software that were prevailing on the street at the time, the price dropped to $18 TPM. With the Sun Fire 15000s, a single machine with 72 processors could deliver about 335,000 TPM (almost twice the work), but cost $3.36 million at list price and about $2 million after discounts; that worked out to just under $19 per TPM at list price and about $12.50 per TPM after discounts on the hardware and database.
Fast forward two years, and Sun ramped up the UltraSparc-III clock speed in the Sun Fire 15000s to 1.2 GHz, goosed the performance to around 475,000 TPM for a machine with 72 processors, and got the price after discounts down to $9 per TPM.
Ahead of the Power4 launch, IBM was already being pretty aggressive with the “Pulsar” PowerPC chips, which ran at 450 MHz and 600 MHz, and a top-end pSeries 680 machine with 18 processors could handle 175,000 TPM at a list price of $8.45 per TPM and a discounted price of around $5.70 per TPM. With a 24-core pSeries 690 using 1.3 GHz Power4 chips, IBM nearly doubled its performance to around 340,000 TPM at a cost of $6.35 per TPM, but with very aggressive 45 percent discounts on the hardware and standard Oracle volume discounts as on the Sun Machines, IBM pushed that down to $4 per TPM. Two years later, when the Power4+ chip came out at 1.7 GHz, IBM pushed the performance up to 764,000 TPM on a 32-core machine and held price/performance more or less steady. By then, Sun was struggling, HP and Intel were wrestling with Itanium, and IBM was getting traction. So IBM could back off from providing a factor of 3X price/performance advantage at street pricing to a mere 2X.
This is a pattern we have seen with IBM again and again – boosting performance and holding price/performance steady when it can, thereby boosting revenues as machines get more powerful and as more customers buy them. The other pattern we have seen for the past decade and a half is that IBM likes the beefiest cores possible because most system software – including its own – is based on per-core pricing. The fewer cores IBM has, the better pricing advantage it has on software, unless companies like Oracle compensate for it with their own scaling factors, chopping prices for its own X86 and Sparc systems and leveling the playing field a bit. If such scaling factors were available in 2001 and 2003, the gap would have closed a bit between Sparc and Power systems.
How Aggressive Is IBM With Power Today?
It is with all of this in mind that we consider a set of benchmark tests that IBM ran recently on its own entry two-socket Power8 systems, stacking them up against two-socket server nodes based on Intel’s “Haswell” Xeon E5 v3 processors. To put it bluntly, this is not the 2000s Unix market, and IBM cannot cut nearly as deeply as it has in the past to win deals and still have a profitable Power Systems business. But the company says that it can demonstrate bang for the buck advantages over workhorse two-socket Xeon machines, and that is something for a market that talks like it wants an alternative but sometimes behaves like what it really wants is lower pricing on Xeon gear.
As we previously discussed, IBM has been positioning its four-socket and eight-socket Power8 machines, launched back in May, at Intel’s Xeon E7 line, and it has been also working with SAP to get the HANA in-memory database running on Power8 machinery as an alternative to running it on Intel Xeon E7s, which for the past several years has been the only platform certified to run HANA. SAP’s Business Warehouse data warehouse software tuned for the HANA database on Power8 machines just started shipping two weeks ago and is getting plenty of interest, Doug Balog, general manager of IBM’s Power Systems division within its Systems Group, tells The Next Platform. Balog adds that the combination of its DB2 BLU in-memory database and its Cognos analytics software is also driving revenues for the Power8 line, mainly because on this software IBM can show about a 2X performance advantage over comparable Intel Xeon iron running that stack.
IBM has just shipped its BigInsights V4 Hadoop distribution, which is the first Hadoop stack from IBM that is based on the Open Data Platform effort with Pivotal and Hortonworks and the first from IBM to have the Spark in-memory processing capability added to it. IBM is very keen on riding up the popularity of Spark in the datacenter, which is arguably much easier to work with than MapReduce and HBase with the core Hadoop stack and which many companies (including Cloudera) think will, over the long haul, become the dominant way that companies engage with information stored in the Hadoop Distributed File System. Back in June, IBM committed to have 3,500 researchers and engineers work on Spark-related projects and donated its own SystemML machine learning algorithms to the Spark community, and the company is counting on the uptake of BigInsights and Power8 iron within its enterprise customer base to help the Power Systems business grow.
Spark is a key part of this growth strategy. Raj Krishnamurthy, chief architect for analytics stack design and performance in IBM’s Systems Group, and Randy Swanberg, a distinguished engineer in IBM’s Power Systems division, published a report on the relative performance of the Power8 and Xeon E5 v3 nodes running a variety of Spark tests. IBM ran a suite of tests called SparkBench, developed by IBM Research and available on GitHub, on the two workhorse machines using different aspects of the Spark stack.
The IBM machine used in the tests is a Linux-only Power S822L server with two twelve-core Power8 chips running at 3.02 GHz; the machine had 256 GB of main memory and three 300 GB disks. The Power8 chip has eight threads per physical core, or 192 threads across the two-socket machine. The Intel machine was not identified by brand, but had two twelve-core Xeon E5-2690 v3 processors, which clock at 2.6 GHz and which have two threads per core for a total of 48 threads across the box. The Intel machine was configured with 256 GB of memory and the same three 300 GB disk drives, too. Both machines ran the Ubuntu Server 15.04 variant of Linux from Canonical, the OpenJDK 1.8 Java stack, and the latest Spark 1.4 version.
IBM acknowledges that it has tuned the Spark code to take advantage of the memory bandwidth, cache structure, and threads of the Power8 system and does not say anything about how the Xeon E5 v3 machine was tuned. We present this data as-is and are confident that Intel might choose a processor with more cores (it could do as many as 18 cores per socket if it wanted to) and do some tuning of its own to boost the performance of Spark running on Xeons.
There are ten workloads that IBM tested within the SparkBench suite. There are four outlined in the chart below, showing the relative performance of the Power8 system in a single-node compared to a single Xeon E5 v3 node. (The Xeon performance is the red line, normalized to 1.)
In this chart, the Twitter streaming component of the SparkBench test calculates the most popular Twitter tag coming off of the data feed from Twitter every 60 seconds. The Pageview streaming test has synthetic clickthrough stream and does page view and user counts every 60 seconds on this stream. The two SQL tests use Spark’s native relational database query as well as Hive with Spark doing orchestration of the queries, which are doing SELECT, AGGREGATE, and JOIN functions during the test. It is here that the larger number of threads on the Power8 system really help boost performance, according to Krishnamurthy and Swanberg. Pageview Streaming also benefitted from the higher thread count, too.
On the Spark machine learning algorithms, the benefits of the Power8 processor compared to the Xeon E5 v3 processor were not quite as dramatic, and it is here that the cache size and memory bandwidth come into play. Spark normally stores intermediate results to disk as it is doing machine learning calculations, but IBM pushed it out to a RAM file system, which helped boost the Power8 side. (It is not clear if IBM did the same trick on the Xeon machine, but it should have.) Allocating the right number of threads to each Spark worker was a bit tricky for the Logistic Regression, Support Vector Machine, and Matrix Factorization algorithms – which are commonly used for recommendation systems as well as for data classification and predictive work – and again, we are not sure if IBM did the same tuning for the Xeon machines.
The last three tests in the SparkBench suite that IBM ran on the two machines were related to graph analytics jobs using the Spark GraphX features, specifically PageRank, Singular Value Decomposition, and TriangleCount. PageRank ranks a web page based on the number of links feeding into it (a method invented by Google), SVD++ is used to do quality recommendations, and TriangleCount finds relationships like those between people on social networks. The data used for these set of tests is the Google Web Graph dataset, and the test has 875,713 nodes and 5,105,039 edges in its graph database. Once again, the jobs that were most sensitive to memory bandwidth saw the most improvement.
IBM did not provide pricing information for the SparkBench report, but after The Next Platform pressed for it, IBM came back with some pricing so we could do some price/performance analysis. (We would point out that we believe in running benchmarks across a wide variety of machines, with vendors doing their own tests and providing system pricing, and having it all peer reviewed before publication. But in the early days of any market, this is not usually the way it works. Enough vendors have to test others’ machines before they come together and establish a proper methodology, and honestly, the community selling new analytics tools has not gotten with this program yet.) In any event, IBM says that the Power S822L system that was used in the test has a list price of $34,929, which is 50.8 percent more expensive than the Xeon E5 v3 system with the same core count, memory, and disk. Moreover, on average across the ten SparkBench workloads, the Power8 system delivered 2.32X the performance as the Xeon E5 v3 machine. When you do the math, that is a 54 percent price performance advantage in this case – it is a lot higher for some workloads, and nil for others.
The obvious question is this: Is a this price/performance advantage compelling enough to make companies think about switching to Power machines from Xeon machines?
“Power8 and everything we have done to design that machine for data continues to resonate well with clients and with ISVs,” explains Balog. “The threads, the caches, the memory bandwidth, the I/O bandwidth, the little endian support – all that good stuff, including CAPI for future innovation, leaves them with a platform that delivers at least 2X performance gains over Intel. If you can get something that delivers 2X the performance for nearly the same price, it is pretty compelling and this is where we continue to see the strong adoption of Power8.”
IBM does not break out revenue figures for its Power Systems division, but Balog said that at constant currency, the business had 1 percent growth in the first quarter and 5 percent growth in the second quarter, which is a lot better than the declines that IBM has seen in this business over the past several years. For a while, IBM grew its Unix sales as HP and Sun/Oracle slipped, but eventually the Great Recession hit the Unix business hard. IBM has concentrated boosting its Linux-on-Power business, and this effort is one reason that the scale-out two-socket Power8 machines have seen double digit revenue growth in the first half of 2015, according to Balog. The Linux portion of the Power Systems business had “high double digit growth” in the first two quarters of the year, but Balog concedes it is against a very small base.
What IBM needs is a watershed event, a big hyperscaler that commits to using Power chips in its datacenters. It could yet happen with Google, which is one of the founding members of the OpenPower Foundation and which has confirmed that it would switch to the Power architecture to get even a 20 percent advantage. Google can build a Power system specifically for its needs and for a lot less money than IBM can do, and we presume that the OpenPower server partners will be able to do that as well. In fact, we expect for IBM to resell OpenPower machinery for Linux workloads to drive the price down and better compete with the Xeon server suppliers.
The real battle might be Power8+ versus “Broadwell” Xeon E5 v4, which are both coming in 2016. And, quite possibly, the “Knights Landing” Xeon Phi as a tag team, now that we think about it. For many workloads, particularly those that like threads and memory bandwidth, Xeon Phi might be a better answer.