SGI may be talking about its efforts to push into the enterprise space with special versions of its UltraViolet shared memory systems that are tuned up to run the SAP HANA in-memory database, but the supercomputer maker has not forgotten its core HPC customer base.
SGI rolled out new UV iron at the opening of the ISC 2015 supercomputing conference in Frankfurt, Germany, including an upgrade to the UV 300 series of machines that are tailored for SAP HANA as well as for generic big iron workloads that run on Linux (and someday run on Windows Server on top of the KVM hypervisor if the rumors we hear are correct). The upgrade to the UV 300 was somewhat unexpected, and in fact, the launch of the high-end UV 3000 clusters was not anticipated until September according to the scuttlebutt we have been hearing.
But, as SGI president and CEO Jorge Titinger explained recently to The Next Platform, the company is keen on ramping sales of the UV systems and wants for the product line to drive at least 10 percent of its revenues this current fiscal year, and getting new iron into the field as quickly as possible helps SGI address a wider range of customer needs.
In terms of shared memory systems, SGI is particularly interested in keeping the scale of its systems well ahead of the “DragonHawk” Superdome X systems, which are based on Intel’s Xeon E7 processors just like the UX 300 systems are. Like SGI has a special version of the UV 300, called the UV 300H, that is configured and certified to run SAP HANA, HP has a version of the Superdome X machine, code-named “Kraken” and sold under the Converged System 900 brand, that also runs only HANA. The two are in a pitched battle to win the top slice of HANA systems, and SGI and Dell have just partnered to chase that opportunity together against HP.
Dell has the ability to sell any and all SGI technology, but has not divulged its plans when it comes to selling SGI gear outside of the HANA space. Dell’s HPC business was running at something like $500 million a year in revenues before it went private, but the HPC market is somewhere between $9 billion and $10 billion a year in revenues and projected to grow to over $15 billion by 2019, according to IDC. That kind of growth is hard to ignore, and an aggressive HP could cause Dell to be even more aggressive. In fact, Dell could go all the way and snap up SGI, which has a market capitalization of around $200 million against around $600 million in annual revenues. This would be a good fit, but SGI no doubt would prefer to grow its business – and its stock price – through partnerships with other system makers who lack big shared memory iron.
With the UV 3000 upgrade, SGI is shifting to the new “Haswell-EP” Xeon E5-4600 v3 processors that Intel launched in June. The E5-4600 processors are special variants of the E5-2600s that have their QuickPath Interconnect (QPI) point-to-point interconnects tweaked so they can support four-socket configurations rather than the two-socket configurations based on the Xeon E5-2600s. The bandwidth between the two sockets is somewhat diminished by doing this, but Intel can supply a lower-cost four-socket system with a fairly large amount of memory for considerably less cost that with its high-end Xeon E7-4800 and E7-8800 processors, which are aimed at four-socket and eight-socket systems, respectively, using Intel’s on glue chips. SGI and Hewlett-Packard offer their own chipset extensions to create NUMA and SMP machines that scale beyond eight sockets.
The UV 1000s from five years ago lashed together Xeon 7500 processors – the predecessors to the Xeon E7 family – using SGI’s own NUMAlink 5 interconnect. The nodes were linked together in a fat tree network topology and offered up to 256 sockets and 16 TB of shared memory; the “Westmere-EX” Xeon E7 processors were eventually added as a processor upgrade to the UV 1000s. Three years ago, SGI took a different approach, moving away from the Xeon E7-class to the lower-cost “Sandy Bridge-EP” Xeon E5-4600s and linked them together into a 256-socket system using its NUMALink 6 interconnect. Intel had boosted physical memory addressing to 46-bits and virtual addressing to 48-bits with the Sandy Bridge processors, and that allowed SGI to boost the shared memory on the UV 2000s to a maximum of 64 TB. The topology on the UV 2000 system was a bit different, with nodes inside of a rack connected using an all-in-all interconnect within a rack with eight two-socket system boards and a 3D enhanced hypercube topology across racks to push the single system image up to 256 sockets. The average distance between nodes was 1.4 hops using this hybrid topology. The NUMAlink 6 interconnect, at 6.7 GB/sec, delivered roughly 2.5 times the bi-section bandwidth of the NUMAlink 5 interconnect, which meant the UV 2000 did a better job keeping all of those cores fed with data.
With the UV 3000 upgrade, SGI is moving up to the E5-4600 v3 processors from Intel, which have the Haswell cores. Generally speaking, the E5-4600 v3 processors have a higher core count and thanks to AVX 2 vector math units, support twice the floating point processing capacity per clock as the earlier Sandy Bridge and Ivy Bridge cores. The Haswell chips also sport larger caches, have better power management and energy efficiency, high instructions per clock across integer workloads, and support DDR4 main memory, which runs faster than DDR3 main memory and which also consumes less energy. SGI is supporting E5-4600 v3 processors in the UV 3000 systems with 6, 10, 12, or 16 cores, with clock speeds ranging from 2 GHz to 2.9 GHz; system sizes range from 4 to 256 sockets, and max out at 8,192 threads across a fully configured machine. The UV 3000 supports DDR4 memory sticks with 8 GB, 16 GB, 32 GB, or 64 GB capacities. The system is still maxxed out at 64 TB of capacity because Intel has not boosted the physical and virtual addressing on the Xeon E5 or E7 chips, and likely will not during the “Broadwell” generation, either. The 10U chassis for the UV 3000 is essentially the same as with the UV 2000, and supports up to 32 PCI-Express 3.0 peripheral slots. Companies can plug in Nvidia Tesla GPU coprocessors or Intel Xeon Phi coprocessors into the machines if they wish.
The UV 3000s are, as far as we know, using the same NUMALink 6 interconnect, by the way, as the UV 2000s. We were expecting for a variant of the NUMALink 7 interconnect, which has even lower latency than the NUMALink 6 interconnect, to be used on the UV 3000s, but that did not happen. The NUMALink 7 interconnect has a 7.47 GB/sec bi-directional peak bandwidth, which is 11.4 percent higher than the networking oomph that NUMALink 6 has.
We also hear from the show floor from some techies we have talked to that the processor upgrade with the UV 3000s makes them nominally more expensive than the UV 2000s, but not by very much. What is different is that companies can now add up to 25 percent more cores (which do considerably more integer and twice as much floating point work each) to the shared memory system, bring more cores to bear against that shared memory. SGI is not withdrawing the UV 2000s, by the way. Some customers will need a different balance of compute and memory, and retaining this product line makes that possible. (It is not clear that you can mix and match Xeon E5-4600 processors of the v2 and v3 vintages inside the same system.)
The UV 3000s will be available in the third quarter of this year, according to SGI, which is consistent with the expected September launch we had originally heard about.
All-In-All For Easier Programming
The NUMALink 7 interconnect is at the heart of the UV 300 machines, which made their debut last fall and which include general purpose as well as SAP HANA variants. With the extended UV 300 line, which will be available late in the third quarter, SGI is making the “Haswell-EX” Xeon E5-8800 v3 processors available in the machines with 4, 10, 16, or 18 cores per processor with clock speeds ranging from 2.3 GHz to 3.2 GHz. SGI says that customers with UV 300-class systems based on the Xeon E5-8800 v2 processors can still add processors and enclosures to stretch their systems, but because of the timing differences they cannot mix enclosures with v2 and v3 variants of the E7 chips in the same shared memory system.
The UV 300 machines based on the Xeon E7 v3 processors now scale up to 64 sockets, up from 32 sockets with the original machines that came out last November. This may seem peculiar in some respects because Eng Lim Goh, chief technology officer at SGI, told The Next Platform back in March that the all-in-all topology made possible with the NUMALink 7 interconnect could only stretch to 32 sockets before the latencies became too great and it would become a NUMA system, with local and remote main memory and data placement issues for applications, instead of what looked like a tightly coupled SMP setup to a Linux operating system. Uniform latencies between processors means the programming model is straightforward – a UV 300 essentially looks like a big, wonking workstation as far as the operating system is concerned, albeit in this case one that can have 576 cores and 1,152 threads. (The core count topped out at 15 per socket with the E7-8800 v2 processors that were used in the original UV 300 machines announced last fall, so this represents a 20 percent increase in core and thread counts.)
The memory on the UV 300, now that 64 GB DDR4 memory sticks are supported and the socket count is scaled up to 64, scales up to 64 TB. (The plan had been to double the memory to 48 TB and keep the socket count at 32 with the UV 300 systems, as far as we know.) Now, SGI is saying it can deliver SMP-like uniform memory addressing across 32 sockets in the UV 300 machines, and “near all-in-all” network topology with 64 sockets. The machines are moving from Ivy Bridge-EX processors to Haswell-EX processors, announced in May of this year, so the newer UV 300s will have slightly better integer performance per clock and twice as good floating point performance per clock.
By pushing the UV 300 up to 64 sockets and 64 TB of memory, SGI is blurring the lines between the UV 300 and UV 3000 systems – and it is doing this on purpose. The UV 300 chassis is effectively a UV 3000 chassis cut in half, although using radically processors because the all-in-all topology requires the extra QPI ports that come on an E7-8800 v3 processor compared to the E5-4600 v3 processor. Customers that want the maximum 64 TB of addressable memory that is the top of the Xeon line today as well as the lower latency and higher bandwidth of the NUMALink 7 interconnect can now get this, and they can get 1,152 cores and 2,304 threads to chew on that 64 TB of memory. The core-to-memory ratio is lower with the extended UV 300s than with the UV 3000s – 18 to 1 instead of 64 to 1 – and for certain workloads this is a better tradeoff. Even if there is a little NUMA tax to pay.
The UV 300 and UV 3000 systems support SUSE Linux Enterprise Server 11 and 12 and Red Hat Enterprise Linux 6 and 7. SGI did support Windows Server on selected models of the prior generation of UV machines, but has not done it with this round because the support matrix is very difficult and Windows-related sales are small. It is far easier to do a custom support engagement at this point and work, as we have heard it is doing, to get the KVM hypervisor running on UV iron and then run Windows and its applications in virtual mode. SGI has shown off this capability, as we have previously reported, but has not talked about when or if it might be productized.