Oracle Aims “Sonoma” Sparc At Scale Out Clusters
August 25, 2015 Timothy Prickett Morgan
The vast majority of the so-called “engineered systems” that Oracle sells into datacenters are based on Intel Xeon processors. This is not at all unusual, given the dominance of Xeons in the modern datacenter, particularly when the software giant and hardware player is able to get customized silicon from its chip partner.
But one of the reasons why Oracle shelled out $7.4 billion to acquire Sun Microsystems back in January 2009 was so it could create highly tuned system stacks, from the processor up through the operating system, databases, and middleware, and all the way out to the applications.
Optimizing engineered systems should mean aligning the software and the hardware very closely, and it logically follows that the best engineered systems should, in theory at least, be based on Oracle’s own Sparc processors, not those from Intel.
That is precisely the mission of the forthcoming “Sonoma” Sparc processor that the company unveiled this week at the Hot Chips 27 conference in Silicon Valley. The Sonoma chip has just about everything needed to make a two-socket server on its die, and Oracle says it will be deployed in low-cost, dense servers aimed at workloads that are scaled out rather than scaled up like its current Sparc M6 and future Sparc M7 systems. Interestingly, Sonoma will sport on-chip InfiniBand adapters. Oracle is a big user of InfiniBand in its engineered systems, preferring it over Ethernet because of the low latencies it offers for clustered servers and storage.
Oracle co-founder and now executive chairman and chief technology officer Larry Ellison took a shining to hardware in the year it took to close the Sun Microsystems deal, and the company continues to invest in the Sparc line of RISC processors, which run the Solaris Unix platform. While Solaris rose the commercial Internet wave up during the dot-com boom – there was no Amazon Web Services back in the mid-1990s and the default system that everyone bought that got venture money for any startup was a rack of Sun servers backed by Oracle databases – it has largely been replaced by Linux running on X86 iron as the default development and deployment platform. It seems highly unlikely that Sparc will ever rise to the shipment and revenue heights it once enjoyed and once gave Sun Microsystems a market capitalization of over $200 billion. But the Sparc architecture is still good and with Oracle investing in it and tuning its hardware and software together for mutual benefit, there is a chance that Oracle can foment a kind of Sparc revival at least among a subset of its customer base.
At last year’s Hot Chips 26 conference in Silicon Valley, Oracle rolled out its high-end M7 processor, which has 32 of the S4 cores, and talked a bit about the Sparc M7 big iron systems that would be using it and this week we get a glimpse of Sonoma, presumably to be called the Sparc T7 when it is eventually put into Oracle systems. The current entry and midrange Sparc processor, based on the S3 cores developed by Oracle, is called the T5. There was not a companion T6 chip that was launched with the high-end Space M6 chip that Oracle unveiled in the summer of 2013. But given that the M7 chip and the Sonoma chip are both based on the same S4 generation of cores created by Oracle, it is reasonable to expect the company to get the naming conventions back in synch.
The Sparc M7 systems are not yet shipping as far as we know, and at the unveiling of Sonoma at Hot Chips Oracle executives gave no hint whatever as to when the Sonoma chip might appear in Oracle systems, including free-standing Sparc systems as well as Sparc SuperClusters, which is the variant of engineered systems that Oracle sells based on its Sparc platforms.
Sonoma Is All About Integration
Like most modern processors, Oracle is using the transistor budget that is enabled by a process shrink to add features to the die that would otherwise be external to the chips. This integration provides better performance through lower latencies between the CPU cores and memory systems and the integrated features, although Basant Vinai, senior principal engineer at Oracle, and his colleague Rahoul Puri, architect of the integrated networking on the Sonoma chip, did not provide details on the latency advantages of the integration.
The Sonoma Sparc T series chip is implemented using 20 nanometer processes from Oracle’s fabrication partner, Taiwan Semiconductor Manufacturing Corp. The Sonoma chip has a total of eight of the S4 cores, which are organized on the die in two blocks of four. The S4 core that debuted with the M7 last year and is used in Sonoma this year has a dual-issue, out-of-order execution unit and has dynamic threading that ranges from one to eight threads per core. This is the same level of threading that IBM has in its Power8 chip, but Sun and then Oracle had this high degree of threading for many years, which was suitable for the database and middleware workloads that the Sparc machines were targeting. (Databases and Java middleware just love threads.)
The S4 core has a number of elements, including two Arithmetic Logic Units (ALUs), one Branch Unit (BU), one Stream Processing Unit (SPU), one Floating Point Unit (FPU). Each S4 core has 16 KB of L1 data cache and 16 KB of L1 instruction cache. Both the M7 chip and the Sonoma chip organize the cores into clusters of four cores, so you can think of Sonoma as a quarter of an M7 with InfiniBand and some other features integrated onto the die that would not fit onto a 32-core monster like the M7. Each set of four cores has a 256 KB L2 instruction cache, and each pair of S4 cores block also has a 256 KB writeback data cache. Both of these L2 caches provide up to 500 GB/sec of bandwidth each. The four core block on the Sonoma die has an 8 MB L3 cache,
Oracle has not said what the clock speed will be for either M7 or Sonoma, but we have been told that the M7 will run at a higher clock speed than the 3.6 GHz used with the M6 chips, and there is every reason to believe that Oracle will try to push the clock speed as high as it can with Sonoma.
The Sonoma chip has two DDR4 memory controllers on the die, each one capable of supporting four memory channels with up to two memory sticks per channel. Oracle is supporting DDR4 memory running at 2.13 GHz and 2.4 GHz, and the main memory will top out at 1 TB using 128 GB DIMMs. Such fat memory sticks are not widely available and even then will be very pricey indeed, so the practical memory limit on a Sonoma server will be more like 256 GB. This is a pretty memory heavy configuration for most datacenter workloads, but 1 TB would be better for some of the in-memory database jobs that Oracle is thinking about with Sonoma. Vinai explained that the Sonoma memory subsystem will have 77 GB/sec of peak bandwidth, which is pretty respectable for an eight core chip, and that the memory controller has a speculative memory read feature that will significantly reduce latencies. How much, Oracle is not saying. The interesting bit is that the Sonoma servers will not use buffered memory (as the Sparc M6 and M7 do and as the Xeon E7 and the Power8 does), but rather directly attached DDR4. This again lowers the cost of the system because it can use stock DDR4 memory.
The Sonoma chip is designed for two-socket machines and has four of the coherence links that are used in Sparc M6 and M7 systems to create massive NUMA clusters. (The “Bixby” interconnect created by Oracle for the Sparc M6 systems could in theory scale to a massive 96 sockets and 96 TB of memory using 64 GB memory sticks.) The coherence links run at 16 Gb/sec both ways and across the four of them they provide 128 GB/sec of bidirectional bandwidth. Each die has two PCI-Express 3.0 x8 peripheral controllers, and importantly has an integrated InfiniBand host controller that provides two x4 lanes that run at 56 Gb/sec speeds and that provide a total of 28 GB/sec of bandwidth out of each Sonoma chip. In a two-socket configuration, both InfiniBand ports can be active, providing dual rail configurations with redundancy and load balancing to get around traffic snarls in the network.
Oracle has not said if it has designed the InfiniBand ports itself or licensed them from a third party, but the odds are that it is licensed from Mellanox Technology, which is a connectivity partner of Oracle and which is a company that Oracle owns a 10 percent stake in. Oracle has bought InfiniBand switch ASICs and adapters from Mellanox in the past and there is no reason to believe the two are not working together. The only other InfiniBand supplier is Intel, with its True Scale products from QLogic, but these are being transitioned to Omni-Path, which is an amalgam of Cray Aries interconnect hardware and QLogic InfiniBand stacks to create an InfiniBand-compatible fabric.
The Sonoma chip has some features that the M7 has as well, including a database query offload engine called Database Accelerator, or DAX for short, that can do database operations on in-memory columnar vectors and operate directly on decompressed and compressed columnar formats. DAX can also do in-memory format conversions, value and range comparisons, and set membership lookups and do inline decompression with query functions to goose performance. Sonoma also sports a circuit called the Application Data Integrity accelerator, which prevents buffer overflows and also prevents malicious attacks that leverage memory such as Heartbleed. Like other Sparc T and M series chips, the Sonoma supports a slew of encryption and hashing algorithms, including AES, DES, 3DES, Camellia, CRC32c, MD5, RSA, DH, DSA, ECC, SHA-1, SHA-224, SHA-256, SHA-385, and SHA-512.
Add all of these features up, and a Sonoma system could offer significantly more performance than the current Sparc T5 machines that Oracle sells. Here is the initial set of thread-level performance comparisons that Oracle has released:
And here is what T5 to Sonoma comparisons look like at the core level:
While the Sparc-to-Sparc comparisons are useful, it would be interesting to see how the Sonoma Sparcs compare to the current “Haswell” Xeon E5s and their successors running the same Oracle workloads.
With all of these features on the die, Oracle is keen on creating a more dense server node that has fewer parts and hopefully a lower bill of materials cost that will help it better compete with Xeon platforms. Intel has integrated its Omni-Path host adapters on a variant of the impending “Knights Landing” Xeon Phi massively parallel processor aimed at HPC workloads and has promised that it will deliver Omni-Path links on a future Xeon processor manufactured using its 14 nanometer processes. (It is our understanding that the Omni-Path links are in package with Knights Landing, but not actually on the die.) We do not think Omni-Path will be integrated with the “Broadwell” Xeon E5 processors that are expected in the first quarter of 2016, but we know for a fact that the “Skylake” Xeon E5s will support Omni-Path links running at 100 Gb/sec. It is not clear if this will be in the chip package or on the die. It is hard to say wo will actually get InfiniBand on the server chip to market first – Oracle or Intel – but technically speaking, and provided that you consider Omni-Path a variant of InfiniBand (as you could argue), then Intel will be first with Knights Landing aimed at HPC but Oracle could be first with Sonoma aimed at database and other clustered workloads.
The interesting thing to ponder is how else Oracle might use the Sonoma Sparc chip. With integrated InfiniBand, lots of cache, lots of threads, and respectable floating point performance, the machine might make an interesting HPC server node – provided the price is right.
Remember that Solaris was a preferred platform for certain simulation and modeling workloads back in the day, but Sun Microsystems started losing that business to Linux clusters starting in the late 1990s and shortly after buying Sun, Oracle has by and largely walked away from HPC because the margins are too low. If the margins we better, and Oracle controlled the whole stack as it can, Ellison might think a bit differently about the possibilities. Oracle has its own variant of Red Hat Enterprise Linux and could relatively easily create a port to Sparc engines. The HPC possibilities – and we are using this in the broadest sense of the definition, not just modeling and simulation – depend on how cheaply Oracle can make and sell the Sonoma servers and InfiniBand switches. For prospective HPC workloads, it might have been better if Oracle had put 100 Gb/sec InfiniBand ports on Sonoma, too, considering that Mellanox launched Switch-IB products running at 100 Gb/sec last November. But for many database workloads, the Remote Direct Memory Access (RDMA) feature and the low latency it enables is more important than raw bandwidth, but then again Mellanox has been moving the latency down with its 100 Gb/sec products.