The hardware part of the high performance computing market is somewhere around $10 billion or so, depending on how you want to count it and who you want to ask, and Bill Mannel, vice president and general manager of a combined HPC and Big Data group within the newly constituted Hewlett Packard Enterprise half of the former Hewlett-Packard, reckons that his employer has north of a third of the business. But that is not enough.
Supercomputing and its analogs in data storage and analytics are growth areas for HPE, and the company is continuing to make investments here to try to get an increasing share of the HPC pie. As far as Mannel can tell from the data, what is now called HPE has increased its share in this space by in each of the first three quarters of 2015 and has grown its share by four points in the past year. Importantly, HPE’s share of the business at the high end is lower than you might expect – somewhere around 8 percent to 9 percent of the Top 100 supercomputer rankings, depending on how the list is moving, Mannel tells The Next Platform.
Given its share, its global reach, and its diverse product line, HPE is a pretty good bellwether for what is happening in the HPC space in a broader sense. The company’s water-cooled Apollo 8000 machines set the bar for energy efficiency and the Apollo 6000s are kickers to the previous SL 6500 series that have been aimed at customers who need the highest compute density. Both of these debuted last year and were one of the big attractions at the ISC14 supercomputing conference. In May of this year, HP gussied up the SL 4500 with shiny new “Haswell” Xeon E5 v3 processors and aimed them at storage-heavy workloads like Hadoop analytics while at the same time launched the new Apollo 2000 series, which are also aimed at hyper-dense datacenters but which have a different form factor from the Apollo 6000s. And then, of course, there are plain old two-socket ProLiant DL series rack servers in various flavors that customers could rack and stack as they see fit for their HPC clusters.
Getting a bigger chunk of high end supercomputer sales is one reason HP Enterprise signed up to be a partner in the Scalable Systems Framework initiative from Intel back in the summer, collaborating with the chip, networking, and storage provider to build systems from its components and filling in some of the vacuum created when IBM stopped selling its BlueGene parallel machines and sold off its System x division to Lenovo. “That’s definitely a goal of ours, and we are definitely competing there with the Apollo 6000 and Apollo 8000.”
Some Big Deals With The Ink Still Wet
HPE has, in fact, just installed a system called “Hikari” at the Texas Advanced Computing Center at the University of Texas that is the first of its Apollo 8000 machines that will be employing 100 Gb/sec EDR InfiniBand from Mellanox Technologies as the interconnect between the nodes. The Hikari system is comprised of 432 of the ProLiant XL730f server sleds. The Apollo server sleds have two dual-socket Haswell Xeon E5 server nodes plus their memory, limited local storage, and warm water cooling apparatus.
The Hikari system has three Apollo 8000 compute racks, each of which supports 72 of these server sleds, for a total of 144 nodes in a rack; it also has two racks of what HP calls the Intelligent Cooling Distribution Unit (iCDU), which is the water cooling part of the machine. It will have over 10,000 cores and deliver over 400 teraflops of peak double precision number crunching performance.
Interestingly, TACC is also deploying 380 volt power into the rack and its testing out higher voltage power distribution as a means to conserve energy. (Japanese telecom giant NTT is also testing this out with its own Apollo 8000s, by the way, and is partnering with TACC to do this.) The Hikari system has just been driven from the Houston factory and installed in the TACC center in the past several days, and it also includes a mix of ProLiant DL360 and DL380 nodes that are used as login, head, router, and data transfer nodes.
The Pittsburgh Supercomputing Center has also installed a new hybrid machine called Bridges, which uses a mix of the Apollo 2000s, fat-memory four-socket ProLiant DL580 rack servers, and Superdome X NUMA machines. The Bridges machines machine is deploying Intel’s Omni-Path Series 100 networking gear to link all of this machinery together. The Apollo 2000s are used for parallel compute and Hadoop data analytics and have 128 GB of memory each. The ProLiant DL580s, equipped with 3 TB of memory, are used for visualization and analytics where heavier nodes help, while the cluster of Superdome X machines with 12 TB of memory is used for genomics, machine learning, graph analytics, and other jobs where having terabytes of main memory matters.
The precise configuration of Bridges is as follows: four Superdome X machines, 42 of the DL580s, and 800 of the Apollo 2000 nodes in the system. PSC is putting two Tesla K80 accelerators from Nvidia, with future Tesla coprocessors also expected down the line, in 48 of the Apollo 2000 nodes. The whole shebang will have local storage as well as a flash-based array for speeding up data accesses for analytics jobs like Hadoop. The GPUs and flash go into the Apollo 2000 nodes only. The Bridges system cost $9.65 million and was paid for by the National Science Foundation. This is precisely the kind of mixed platform that The Next Platform spends a lot of time thinking about. It will be operational in January next year.
Ghent University in Belgium has also upgraded its existing SL 6500 cluster with a new Apollo 6000 series machine that has 200 ProLiant XL230a compute nodes running Red Hat Enterprise Linux 7; it is unclear what fabric it is using. The cluster at Ghent University also has sixteen ProLiant DL380s crammed with disk and flash that act as a high speed data store for the compute cluster.
HPE has also set up its own cluster in its Houston datacenter based on an Apollo 2000 cluster using Omni-Path interconnects for customers to come play with to test their applications, which will help drive future sales of Apollo machinery.
Taking A Look At The Trends
Given how so many customers prefer traditional rack-based systems, it is reasonable to ask if all of the engineering that went into the Apollo systems is paying off. In a world where hyperscalers have shown the benefits of simplicity, uniformity, and scale, such engineered clusters as the Apollo line (and indeed, the Moonshot hyperscale systems not particularly aimed at the HPC crowd but sometimes used by them as cluster controller nodes just the same) stand a bit in contrast and would seem to buck the trends. But not so. The future of HPC is going to be dominated by machines that are precisely tuned for specific workloads – and in many cases, like the Bridges cluster, it will span many architectures as it does different portions of an overall HPC workflow.
Look at the trends. Mannel tells The Next Platform that a couple of years ago, when the SL machines embodied what has become the Apollo line and the Apollo 8000 and 6000 machines were in their pre-launch phase, these HPC-engineered machines accounted for somewhere between 20 percent and 25 percent of HPE’s overall HPC system sales. The remaining revenues were driven by traditional ProLiant rack machines and BladeSystem blade servers; this remaining chunk was split half and half and has been consistent.
Last year, in fiscal 2015 (which ended in October of that year), somewhere around 40 percent of HPE’s HPC system sales were driven by Apollo machines, with the remainder split between blades and rack. General purpose machines were on the wane, and they are seeing slippage here in fiscal 2016. Mannel is projecting that Apollo machines will comprise somewhere between 50 percent and 60 percent of HPC system revenues with the remainder again being pretty evenly split between racks and blades with perhaps a slightly less steep decline in blades.
“I am happy when I actually see this, but customers are starting to move toward more special-purpose architectures rather than having one set of infrastructure,” says Mannel. “This shows that IT is trying to be more responsive to the requirements of their user constituency instead of being focused on IT for its own sake. Some large customers who were traditionally blade users, for instance, are sampling from the menu, and I am seeing that more and more. Some of this is driven by the customer base, but HPE and other vendors are also doing a better job of providing a consistent management interface across the different kinds of infrastructure.”
There are interesting trends that are going on in processors and networking that HPE is seeing.
As is usually the case, some HPC shops can wait until the next generation of compute engines are available, and some cannot. Intel launched the Haswell Xeon E5 v3 processors in September 2014 and is expected to launch the “Broadwell” Xeon E5 v4 chips in the spring of 2016 (perhaps in February or March, we hear). The “Knights Landing” Xeon Phi processors are going to be ramping through the end of the year and into early next year, too, which complicates planning a bit even if it does give customers an interesting and we think appealing option in a parallel X86 architecture with pretty beefy floating point performance. Interestingly, Federal customers putting together RFPs in recent months are specifying that they want Xeon processors, either Haswell or Broadwell, but that the infrastructure they propose has to also be able to support Knights Landing. (We are not surprised by this.) And we do not think that companies will want to use Knights Landing as a coprocessor, but as a standalone processor just like a Xeon.
HP will be supporting Knights Landing processors in its Apollo 6000 line to start, followed by the Apollo 2000 and Apollo 8000 after that. The Apollo 6000 line will get the Knights Landing motors as close to Intel’s formal launch as HP can manage. (The Apollo 4000 is aimed at data analytics and may not make sense hosting Knights Landing chips.)
The choice between Haswell and Broadwell depends on the circumstances for customers who have been buying this year and into early next year. Some are waiting for Broadwell, period. Others who are more sensitive to clock frequencies or other factors or who do not want to go through the hassle of recertifying their software to run on Broadwell are sticking with Haswell, or waiting for a third party vendor to do it if they buy rather than build their code.
On the networking front, there are some changes underway, too.
In the oil and gas industry, for instance, Mannel says that Gigabit Ethernet and sometimes 10 Gb/sec Ethernet has been the mainstay for a long time. But surprisingly, a lot of these customers are thinking about InfiniBand and Omni-Path, and some are starting to migrate some of their workloads to these higher bandwidth, lower latency interconnects.
In general in the HPC space, HPE is seeing customers begin the transition from 56 Gb/sec FDR InfiniBand to 100 Gb/sec EDR InfiniBand from Mellanox, and Omni-Path is starting to get some traction. “We had lots of customers who were saying in their RFPs to give them InfiniBand, and now more and more customers are being more flexible and are asking for an interconnect that provides them with a certain level of performance and not specify the technology. We are also seeing customers asking directly for Omni-Path, which is interesting versus a year ago where it was heavily InfiniBand for most HPC stuff.” Mannel has not seen any demand for 100 Gb/sec Ethernet in the HPC base.
The one question we had is what are customers doing with regard to the mix of new technologies. Knights Landing will eventually have Omni-Path integrated as well as a freestanding processor that lets customers plug in whatever interconnect they want using PCI-Express cards. Are people going to be willing to change their processor architecture, their network fabric, and their server form factor all at the same time?
“They are diving in,” says Mannel emphatically. “We have some customers that will stick with InfiniBand, but they want Knights Landing as an option. Not all customers will do this, but let’s say a good half of them will take the leap and integrate all of it. It is going to be an exciting year ahead.”