While we all spend a lot of time talking about the massive supercomputers that cultivate new architectures, and precisely for that reason, it is the more modest cluster that makes use of these technologies many years hence that actually cultivates a healthy and vibrant HPC market.
Lenovo picked up a substantial HPC business when it acquired IBM’s System x server division two years ago and also licensed key software, such as the Platform Computing stack and the GPFS file system, to drive its own HPC agenda. The Sino-American system maker has been buoyed by higher volume manufacturing thanks to the combination of Lenovo’s existing ThinkServer systems business and the System x business from IBM and by greater pricing leverage thanks to the substantial PC business that Lenovo acquired and expanded after buying IBM’s PC business a decade ago, and that means it can compete with Hewlett Packard Enterprise, enlarged by its recent acquisition of SGI, and Dell, embiggened by its recent acquisition of EMC, in the race for HPC dollars.
While Lenovo has its share of high-end, capability class supercomputers, including a big system named “Marconi” based on Intel’s “Knights Landing” Xeon Phi processors and Omni-Path networking at Cineca in Italy that will be upgraded with “Skylake” Xeon E5 processors next year and it has just announced that it is part of the Mare Nostrum 4 procurement at the Barcelona Supercomputer Center being spearheaded by IBM, the company does a lot of its deals at more modest and more mainstream HPC centers.
“We are having pretty decent success at the high end, which you would expect since most of our team came from IBM and we did a lot of its big, complex clusters,” says s Scott Tease, executive director of high performance computing at Lenovo. “But even more important for me is that we are seeing a lot more smaller systems. We have had so many wins, and even academic systems that I thought would make the Top 500 list in November, didn’t make it because they had under 350 teraflops of performance. We are succeeding with workhorse systems in the public and commercial sectors.”
One of the key factors that is driving the Lenovo HPC business is that it is more geared for high volume manufacturing than IBM was, and that the company is explicitly building an organization that can help push HPC into the masses and then benefit as the use of simulation, modeling, analytics, and machine learning all grow on the systems. Once companies get a taste for HPC, they keep coming back to the buffet table to load up.
“Our bread and butter HPC business is for clusters that cost under $1 million,” Tease tells The Next Platform. “One of the tough things we had when we were part of Big Blue is that that the IBM machine was really geared to do big, complicated, high touch things really well. They were not really good at doing the run rate, low touch deals. This is where Lenovo really excels because it owns its supply chain and the manufacturing lines, and it is a much easier company to do business with for these smaller deals. Our sweet spot deal is from $600,000 to $800,000, and we are doing a lot of those deals. And then we try to pick a few of these halo deals, like Cineca and BSC, just to make sure that people understand that we are capable of taking on any challenge. We have a balanced business, and we are growing revenue and we are profitable, which are all good things.”
Eking out profits in the HPC sector has rarely been an easy thing, for reasons that often confound supercomputing suppliers and their investors. Competitive bidding and budget caps on capability systems, which vendors chase as much for pride and the engineering challenge, limit the profits at the high end of the business, ironically. Just like it is tough to make a buck on the hyperscalers, by the way. Vendors crave the volumes, but they wish they could make more than a penny or two on the dollar for all that effort. Profits come from smaller customers who don’t have the same leverage, and as long as no one starts an HPC price war – and there is a good chance with Lenovo, Dell, and Hewlett Packard Enterprise all being aggressive in the midrange and low end of the HPC market.
Having this two cycle approach is not always easy, and all HPC vendors wrestle with this issue.
“The team always wants to do the big deals because they are big elephant hunters,” says Tease with a laugh. “There is a lot of glamour involved in it. But it is hard to make money on those really big deals, and they force you to take a lot of risk and you do unique things that end up costing a lot of money. They are good to do, and we are proud to do them, but you can’t do all of your deals like that.”
The good thing about this high-middle strategy that Lenovo has is that it keeps the HPC team busy. While Lenovo is not seeing a slowdown yet in the number of clusters it is passing through its factories, and it is not yet seeing the impact of the future “Skylake” Xeon E5 v5 processors from Intel, which are due in the middle of next year. The exact timing is not known for the Skylake Xeons, even inside of the server industry, and so for now at least sales of systems for “Broadwell” Xeon E5 processors have been holding up, but we have also heard rumblings of a server slowdown among enterprises in the third quarter that could bleed over into HPC as much because of the political and economic climate of the world as to any server chip roadmap issues.
“We have not seen a slowdown yet, but we may see one ahead of Skylake,” concedes Tease. “We are still seeing a lot of Broadwell wins coming in, even into the first quarter of next year. I do suspect, and we are planning for, our fourth quarter, which runs from April to June in 2017, to be a relatively light quarter from a revenue perspective ahead of Skylake, but the good news is that we have those big wins at Cineca and BSC that will keep our teams busy doing installations.”
Skylake, says Tease, is going to be a big deal, and this is something that we wholeheartedly agree with. (We would put Skylake second only to the “Nehalem” Xeon 5500 processors that came out in the Great Recession during early 2009 and that pretty much vanquished AMD from the datacenter.)
Lenovo has Skylake processors running in the labs, and with the Cineca deal, it is preparing to build a third phase of the Marconi cluster up that will add 11 petaflops to the system by the summer of next year based on dual-socket nodes using Skylake chips. The initial phase of Marconi had 1,512 nodes based on Broadwell Xeons with 1.7 petaflops of sustained Linpack performance, and the second phase, which was completed just in time for the November 2016 Top 500 supercomputer rankings, adds 3,600 Xeon Phi 7250 nodes to the 100 Gb/sec Omni-Path fabric linking all the nodes, adding an extra 6.2 petaflops sustained to the hybrid machine. (The whole Marconi shebang weighs in at 12 petaflops peak.) The third phase of the project, based on Skylake Xeons, is anticipated to add another 7 petaflops of compute to the system and driving the aggregate peak performance to close to 20 petaflops.
“With the AXV-512 instructions, the amount of floating point performance we are getting out of Skylake is approaching what we are getting out of a Knights Landing Xeon Phi,” says Tease. “We are pretty excited about what it means, and as we go down the Top 500 list and see the systems we are building, Skylake is really going to change up the whole list because of the performance of AVX-512.”
Intel, of course, confirmed that it would be adding the AVX-512 instructions, which first debuted with the Knights Landing manycore Xeon Phi chips, to the Skylake Xeon E5s at the SC16 conference.
While Lenovo has some research projects based on ARM processors in the field, notably at Hartree Centre in England, Tease does not expect for ARM chips to be something that it will widely deploy in 2017.
“Two years ago, we started building prototype machines based on Cavium’s ThunderX, and we continue to do prototyping for clients,” says Tease. “They all have different hypotheses about what ARM is going to do for them, but we are just not seeing a broad use case develop where ARM has advantages over Intel. We went into this thing thinking we were going to see great power efficiency gains, and what we have come to believe is that the reason your cell phone battery lasts so long is because your ARM processor doesn’t do a lot. Everything is offloaded to graphics processors and wireless devices and so forth. The processor does a great job of power and battery management, but in a server environment, you need cores to run all the time. We are not seeing the efficiencies we expected to see. I think if ARM is going to find a home somewhere, it is probably going to be in storage or in Hadoop or Spark. But I think 2017 is just a little bit early for widespread adoption of ARM.
On the OpenHPC front, it has been a year since Lenovo contributed the xCAT/Confluent cluster management and provisioning tool to the open source HPC software stack effort being championed by Intel, and at SC16 the company rolled out a new graphical user interface called “Antilles” for the HPC tools that Lenovo uses in its clusters that gives that stack an enterprise fit and finish. Interestingly, even though Lenovo is a member of the OpenHPC consortium, it has not agreed to distribute and support Intel’s HPC Orchestrator commercial variant, as have Dell, Fujitsu, and HPE, nor has it decided to roll its own, which would include Platform LSF and GPFS for sure as well as other IBM tools, like xCAT, that it inherited as part of the System x deal.