Oracle Takes On Xeons With Sparc S7

It is an accepted principle of modern infrastructure that at a certain scale, customization like that done by Google, Amazon Web Services, Microsoft, or Baidu pays off. While Oracle is building its own public cloud, it does not have the kind of scale that these companies do, but it does have something else that warrants customization and co-design up and down its stack: more than 420,000 customers who generate $38.5 billion in sales.

This, in a nutshell, is why Oracle continues to invest in its Sparc processors even though many of its customers deploy Oracle’s middleware, database, and application software on X86-based systems created by itself and its rivals in the systems racket. And that is why it is explicitly taking on Intel-based platform – including its own – with the new “Sonoma” Sparc S7 processor and the systems that have just debuted employing it.

The Oracle enterprise base is broad and deep, and utterly dwarfs the one that AWS has created in its first decade, and rivals that which Microsoft has created for the Windows Server platform and which is fueling its own Azure cloud. While Oracle has spent more than six years building up its 5,000-strong customer base for its Exadata, Exalogic, and Exalytics engineered systems (which are based on Xeon processors), its software base is orders of magnitude larger and if Oracle can get even a modest attach rate on its Sparc systems in this base, it has an opportunity to radically expand its hardware business. Look at the numbers: Oracle has 310,000 database customers, 120,000 Fusion middleware customers, and 105,000 application customers, and over 75,000 companies that have bought its Sparc and X86 systems. Most of them have Sparc machinery, even though the business for these systems, like all other RISC/Unix iron, has been in decline since reaching a peak in dot-com boom nearly two decades ago.

Say what you will about Oracle co-founder and CTO, Larry Ellison, but he has made good on his word to create a Sparc processor and system roadmap and, more importantly, stick to it and deliver it on time. The “Sonoma” Sparc S7 processor, which was unveiled at last year’s Hot Chips conference, takes the S4 core that was originally used in the Sparc M7 chip that Oracle started shipping in big NUMA systems late last year and creates a chip that is more suitable for scale-out two-socket servers that dominate the datacenters of the world. The S7 was not on the original Sparc roadmap that Oracle put out in 2010 after it acquired Sun Microsystems, but it is logically consistent with two-tiered approach that most server chip makers have had in recent years to cover scale-out and scale-up systems.

The S4 core has a dual-issue, out-of-order execution unit and has dynamic threading that ranges from one to eight threads per core. Each S4 core has 16 KB of L1 data cache and 16 KB of L1 instruction cache. The M7 and S7 chips organize the cores into clusters of four, and to a certain way of thinking the eight-core S7 is a quarter of a 32-core M7 with InfiniBand and some other features integrated onto that could not be crammed into the M7 and that might make it onto the future M8. Each set of four cores on the S7 chip has a 256 KB L2 instruction cache, and each pair of S4 cores block also has a 256 KB writeback data cache. Both of these L2 caches provide up to 500 GB/sec of bandwidth each. The four core block on the S7 die has an 8 MB L3 cache, and sports two DDR4 controllers that can support memory running at 2.13 GHz and 2.4 GHz clock speeds. Using 64 GB memory sticks, a two-socket S7 system can have up to 1 TB of main memory and has a peak memory bandwidth of 77 GB/sec. There are also two PCI-Express 3.0 peripheral controllers on the S7 die as well.

Marshall Choy, vice president of product management for systems at Oracle, says that the memory latency with the S7 is a little bit lower than for the M7 because the memory controllers are now integrated on the processor. With the M7, the processor had a controller but the buffer chips that actually ran the DDR4 protocol were external to the processor and put on the memory cards (just like Intel does with the Xeon E7 and IBM does with the Power8). As IBM will be doing with the scale-out version of the Power9 chip next year and as Intel does with Xeon E5s already, the S7 will use stock DDR4 memory without buffer chips because the memory controller is wholly on the die. Moreover, because the processor is smaller and Taiwan Semiconductor Manufacturing Corp has matured the 20 nanometer process that is used to make the S7, its clock speed, at 4.27 GHz, is a little bit higher for the M7, which cycles at 4.13 GHz. Both have fairly high cycle times compared to Intel Xeons and are in the same ballpark as the IBM Power8. The S7 and M7 cores both support eight threads per core, same as the Power8, and four times what Intel supports per core with the Xeons. So the S7 should, core for core, outperform the M7 and, more importantly, give both the Power8 and the Xeon E5 a run for the money in the scale-out datacenter.

“The goal here is to bring down the economics of the system, whether we are talking about servers or engineered systems,” Choy tells The Next Platform, “and what we have achieved here is commodity X86 cost points but also bringing in added enterprise functionality specifically around software in silicon. This is not just a pricing exercise for us, in terms of getting these X86 price points, but an engineering exercise in reducing our overall costs. Our margin model is preserved across S7 and M7.”

That on-chip acceleration includes accelerators for security, encryption, database processing, and analytics that were first etched into transistors with the S4 cores in the M7 processors for high-end Sparc M7 systems. One important feature in the software-in-silicon stack from Oracle is the ability to transmit fully encrypted virtual machines in Solaris when live migrations occur between two physical machines, and another is the Data Analytics Acceleration, which is an offload engine built onto the chip that does certain kinds of SQL processing common in databases (from Oracle and others) and speeds up the SQL queries by a factor of 10X.

oracle-s7-offload

“We have seen on the X86 side for the past couple of generations now per core performance has stagnated a little bit, but with Sparc we continue to increase per core performance,” brags Choy. “Depending on the workload, we will provide 50 to 100 percent improvement in per core performance compared to the latest generation Broadwell Xeon cores. Obviously we have a great story around efficiency, with the offload, and much more bandwidth.”

(We will be drilling down into the performance of the S7 systems versus Xeon E5 and Power8 machines separately.)

The one thing that made the S7 chip particularly interesting was the integrated InfiniBand host controller that provides two x4 lanes that run at 56 Gb/sec speeds and that provide a total of 28 GB/sec of bandwidth out of each Sonoma chip. In a two-socket configuration, both InfiniBand ports can be active, providing dual rail configurations with redundancy and load balancing to get around traffic snarls in the network. For reasons that Oracle is being a little bit vague about, these InfiniBand ports are not activated on the S7 chip, which could mean all of the quirks might not be worked out of them yet. As we have previously reported, Oracle is working on its own 100 Gb/sec EDR InfiniBand leaf and fabric switches, based on its own silicon, not that of partner Mellanox Technologies. These are expected to ship around now, but have not as far as we know. Oracle has a roadmap to develop its own 200 Gb/sec HDR InfiniBand switch ASICs, too. InfiniBand is at the heart of the engineered systems that Oracle sells, and the company no doubt wants to use InfiniBand to build its public cloud and the clusters that its customers put on site.

It is not clear when Oracle will fire up those InfiniBand ports on the S7, but it may be concerned about an impedance mismatch with 56 Gb/sec ports on the S7 server chips and 100 Gb/sec ports on its impending switches. Perhaps Oracle is taking a bit of time to crank up the clocks on the ports for a next-rev on the chips?

The S7 Systems

The S7 processors, like other Sparc chips, run Oracle’s Solaris Unix variant, which is well-regarded but nowhere as mainstream as Linux or Windows Server is these days – with Linux dominating a lot of the back-end and front-end at many large enterprises where Oracle derives the bulk of its revenues. The S7 machines support Solaris 11.3 to be specific. Solaris is bundled with the system, as is the case with all Sparc machines from Oracle, and like Linux is prebundled on its X86 iron. The systems can support Oracle 11g Release 2 or Oracle 12c in the Enterprise Editions of the company’s database.

There are two rack-based machines that employ the S7 processor, plus a preconfigured MiniCluster system that falls short of a full-on engineered system but is more like a converged system complete with servers, storage, and switching.

oracle-s7-systems-2

The Sparc S7-2 server is a 1U pizza box machine that has two processors, and customers can buy them with either one or two CPUs installed. The base system comes with a four-port 10 Gb/sec Ethernet controller, like other Sparc and Xeon servers from Oracle, and has room for three 2.5-inch drives, which can be 600 GB or 1.2 TB SAS disks, 400 GB SAS flash drives, or 3.2 TB NVM-Express flash drives. The Sparc S7-2L server comes in a 2U form factor with two processors configured in the box from the getgo and room for 24 2.5-inch drives or 15 3.5-inch drives. (Oracle supports 8 TB SAS-3 disk drives in the 3.5-inch bays.) The 24 bay setup can have up to a dozen NVM-Express devices.

The MiniCluster S2-7 is just what the name suggests, it is a baby cluster built from two S7-2 servers and an Oracle disk array. It has a total of 32 cores and 1 TB of memory as configured (using 32 GB memory sticks and fully populating the machine, so be careful) plus 16.8 TB of flash storage and 48 TB of disk storage.

oracle-s7-systems

The S7 systems are available now; pricing for configured machines was not available at press time. Oracle is also selling dedicated instances based on S7 nodes on its Oracle Cloud public cloud.

The real question, then, is when does Oracle switch its Exadata platform from Xeon to Sparc? Choy says that Oracle will continue to sell X86-based engineered systems, as it has been doing and is mum about any specific future plans for the S7 systems. But clearly if Oracle really believes that Sparc machines are better, then there should be a variant of Exadata, Exalogic, and Exalytics – you do the naming, and all you gotta do is add an S in front – based totally on Sparc and that shows these machines outrunning Xeons.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

10 Comments

  1. Oracle talks out of both sides of their mouth. For years they have touted performance based on the total sum of all cores in a socket as they always had weak cores but alot of them. Now they cut a S7 by 75% improving thermal dynamics and less plumbing contention that the 32c chip has and now its all about core strength. Interesting given that Intel is now delivering 22 cores (EP) & 24 cores (EX) respectively so having a 8 core chip seems perplexing if they are competing against them.

    • When we talk about the best performing cpu, the only interesting thing is the performance of the… cpu. Not core. In this case, who has the fastest core does not matter. Neither does it matter who has the lowest latency, fastest ALU, etc – or other tiny parts of a cpu. We are comparing cpus here: POWER vs SPARC vs x86. You can not extrapolate from benchmarking a small part of the cpu, to conclude which cpu is fastest. For instance, SPARC M7 has 32 cores, and if you bench one core, you are benching 1 / 32 = 3% of the SPARC M7, and from that tiny number you are trying to reach a conclusion about the whole SPARC M7 cpu. You can not draw a conclusion by looking at less than 3% of the SPARC M7 cpu. That is wrong to do.

      Which car is fastest? Corvette, because “it has faster pistons in the engine”. Never mind the top speed of Corvette is 240 km/h, and Lamborghini has 320 km/h. Corvette will still say “our car is the fastest, because our piston is fastest on the market”. This is weird and faulty logic. Some would even call it FUD. If Lamborghini runs 320km/h the car is faster. No matter how fast the Corvette pistons move. This is exactly the situation with IBM POWER. IBM claims their POWER cores are faster so the entire POWER cpu is faster – never mind SPARC cpu scores higher in the benchmarks. Here is an example:
      https://ibmadvantage.com/2013/04/29/weblogic-12c-on-oracle-sparc-t5-8-delivers-half-the-transactions-per-core-at-double-the-cost-of-the-websphere-on-ibm-power7/
      “…[Oracle SPARC] being “fastest processor in the world” means that such processor must be able to handle the most transactions per second [b]per processor core[/b]……..IBM POWER produced the world record result in terms of EjOPS per processor core – truly a measure of the fastest processor known to men……Since Oracle knew they can not produce the most efficient result in terms of cost or transactions per second, the only way for them to claim world record was to throw large hardware at it and produce the biggest total number of EjOPS. Not a very useful metric I must admit….”

      .

      And if you really want to talk about core vs core, SPARC M7 cores are typically 2-3x faster than x86 cores, all the way up to 11x faster. And as we know, x86 cpus and cores are faster than POWER cpus and cores – this must mean that SPARC M7 cores are faster as well. Here are several benchmarks for this new SPARC S7 sonoma scale-out cpu, compared to x86 and POWER8.
      https://blogs.oracle.com/BestPerf/entry/20160629_jbb_sparc_s7_2

      And if you really need maximum raw floating point crunching power, the SPARC M7 is much faster than both the x86 and the POWER8. SPARC M7 is basically four times faster than this SPARC S7 sonoma cpu. SPECint2006 benchmarks can be found in the same link above.

      • CORRECTION: the last paragraph is wrong, where I wrote “SPARC M7 cores are typically 2-3x faster than x86 cores, all the way up to 11x faster”. This is not true. I wrote that paragraph late and was tired. I can not edit so I post an addendum here. The correct version is:

        “…And if you really want to talk about core vs core, SPARC M7 cores are typically 1.5 – 2.0x faster than x86 cores. This is proven by looking at all the benchmarks below, where one single 32 core SPARC M7 cpu typically is twice as fast as two E5-2699v3 with 36 cores.

        And as we know, x86 cpus and cores are faster than POWER cpus and cores – this must mean that SPARC M7 cores are faster than POWER8 cores, as well. Here are several benchmarks for this new SPARC S7 sonoma scale-out cpu, compared to x86 and POWER8. And if you dig a bit, you can also find 30ish benchmarks of SPARC M7 vs x86 or POWER8…”
        https://blogs.oracle.com/BestPerf/entry/20160629_jbb_sparc_s7_2

  2. There is a SPARC Exalytics.. and SuperCluster is basically a SPARC superset of exalogic and exadata.

  3. Show me the SQL benchmarks, as there is no other reasons for this chip other then to run the Oracle database.

    • Performance is of course somewhere in between similar and few times better, than best 22(24) core Xeon E5/E7 V4 processors (depends on benchmark) …at least according to Oracle data.

      Plus S4 core has big advantage in accelerated encryption, where can run fully encrypted system with minimal performance penalty.

      But of course, in other situations, where depends on theoretical raw floating point performance SPARC does not make any sense. In this situations will be better choice Xeon E5 or Xeon Phi (or GPU).

      • For raw floating point performance, SPARC M7 holds the world record with 832 SPECfp2006, vs IBM POWER8 reaching 468 SPECfp2006. Of course, x86 reaches 474 SPECfp2006 making IBM POWER8 the slowest in class.
        https://blogs.oracle.com/BestPerf/entry/201510_specpu2006_t7_1

        Here are 40ish different world records SPARC M7 holds, other than SQL benchmarks. For instance, neural network, big data, SPECcpu 2006, graph traversal, STREAM ram bandwidth, etc etc etc. And regarding business enterprise software such as SAP, databases, virtualization, SPARC M7 crushes, being up to 11x faster than the fastest x86 or POWER8 cpu.

        I dont see that SPARC is only fit for running SQL benchmarks? It is fastest in every single benchmark I have ever seen. I have never seen SPARC M7 being slower in any benchmark. Typically it is 2-3x faster, all the way up to 11x faster.

    • No worries. Sparc machines have several different kinds of virtualization technologies, but VMware is, with the exception of some stuff they did for mobile phones, restricted to the X86 instruction set. Earlier Sparc systems have dynamic domains, a kind of hardware partition that was reconfigurable on reboot. Solaris, of course, supported containers (often called zones), but these are really a type 2 hypervisor and not a container lock is used with Docker or LXC on Linux. (Heavier, with a full shared file system and kernel, but distinct Solaris runtimes that magically looked like full operating systems to the applications. Sparc T series and I presume S series processors also support LDoms or logical domains, which are akin to VMware ESXi virtual machines but are in no way compatible with ESXi.

  4. When does Oracle switch its Exadata platform from Xeon to Sparc? The SPARC version of Exadata is called SuperCluster M7. The SPARC version of Exalogic is called SuperCluster M7. So as you can see, Both Exadata and Exalogic, but running SPARC, is integrated into SuperCluster M7. SuperCluster M7 allows you to run any workload from Database to applications, middleware and even Real-time Analytics and Big Data workloads

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.