Not long ago, we learned that a new Cray XC40 supercomputer would be installed in the coming months for Norwegian oil and gas processing company Petroleum Geo-Services (PGS). The five-petaflops capable system will likely take a spot in the top 20 publicly listed HPC systems on the planet, but according to PGS (and almost every other enterprise HPC entity we’ve spoken with in the last few years), this ranking is far less important than the fact that it was designed to actually make full, unified use of that compute power.
The Top 500 and its companion LINPACK benchmark, which measures raw floating point performance among the world’s top supercomputing sites, is an utterly useless metric for most seismic applications, says Sverre Brandsberg-Dahl, chief geophysicst and head of data processing and technology at PGS.
“It’s no more than a pissing contest,” Brandsberg-Dahl tells The Next Platform. “This is especially true for us working with single-precision codes. It has no bearing on how we evaluate new architectures.”
What actually matters to PGS, and a growing number of companies and research institutions going forward, is the ability to have a large amount of memory, a strong interconnect, and the ability to scale the problem size in what looks, to the application, like a single machine.
This might explain why Cray has had recent success for very large supercomputers where quite often, the overriding message from the buyers is that they need more oomph from the interconnect than they need with raw compute horsepower. After all, if that wasn’t the case, there is no reason why Dell, Hewlett-Packard, Lenovo, and others wouldn’t rule the HPC space completely (although as it stands they have the vast majority share on Top 500). But as problem sizes grow larger, more demand is placed on both the memory and getting data around, which is not something other ways of speeding up machines can fix by adding accelerators like GPUs, as so many of the fastest systems have done.
Even though PGS is currently experimenting with GPUs, the organization is a long way from putting those into production. And while PGS sees opportunities with FPGAs as well, those are even farther off. What will really change the equation, says Brandsberg-Dahl, is when accelerators are on the same die—something that he says he and his teams are watching closely with the “Knights Landing” Xeon Phi chip from Intel and the Power8-based OpenPower efforts both on the horizon. The question one would ask then, if this was a barrier, why not just wait for something like Knights Landing, which is coming to market in the second half of this year and which will be a server processor in its own right and one with a high-speed interconnect mated to the chip.
“We would love to wait for the next best thing, but this needs to go immediately into production,” explains Brandsberg-Dahl. “We’re under tight deadline. HPC is not a hobby or experiment, we’re not the national labs or universities. We have to start working with the data from a very large seismic survey from the Gulf of Mexico immediately, but we’re always evaluating what is next.”
On the accelerator note, he says that GPUs are still only experimental in part because of the homegrown Fast Fourier Transform (FFT) code the teams use. “These are spectral codes, like in weather, for instance. The operators we use are global, it’s all-to-all communication. That means network and backbone infrastructure are critical, so for us, it’s not about the processor, it’s about the interconnect.”
Brandsberg-Dahl has watched several trends in HPC play out for seismic applications. Since coming to PGS in 2008 and before that, performing many of the same roles at oil giant BP, he has been at the helm of making key architectural decisions for a large number of systems. “In the early days, with the machines we worked with, CPU and memory were all working at about the same speed. So you could just roll data through the system. But as the CPU kept getting faster and you wanted to solve bigger problems by loading things in and spreading them around something like a NUMA-based system, we were bumping up against the limits of what we could get in a box.”
Although LINPACK might not be a useful benchmark for PGS and some other seismic companies, PGS is religious about building and testing its own benchmarks to gauge the performance of new architectures. As Brandsberg-Dahl explains, its benchmarks look more like a “what if” scenario across a 3,000-cube 3D FFT. “We require one contiguous memory space where you have to run a global transform and touch on all of those points. Today, two terabytes won’t fit in a 1U server. We already run big datacenters with lots of well-interconnected blade-based systems, but those max out at 128 or 256 gigabytes.”
“We have been forced to scale down our problems to fit The Next Platform we have because the moment you put the problem in and try to run our distributed FFT, your performance drops by orders of magnitude the minute you go outside the motherboard in these machines.”
These are not new trends or requirements for these codes, but as the need to fit more of the problem into memory increases, the list of vendors and architectures is narrowing. Brandsberg-Dahl has watched the pendulum swing for 20 years, working with FFT on large SGI shared memory systems at the university level, to moving into distributed memory FFT-based processing. And while he says there will never be enough memory and interconnect grit to contain expanding seismic applications, there have been some promising developments, especially with the upcoming Knights Landing and OpenPower architectures that aim to create a true unified super-system.
As one might imagine, PGS is not tied to Cray – it already owns another smaller XC30 cluster in its R&D division, which helped them prove out the architecture of that machine ahead of the larger XC40 purchase. PGS expects the XC40 to serve their needs, in particular on the Gulf of Mexico seismic survey work, for the next two years and after that it will reassess what systems are available. Brandsberg-Dahl says PGS is already very keen on the promise of Knights Landing, but is watching the OpenPower collective – IBM Power processors, Nvidia Tesla coprocessors, and Mellanox InfiniBand networking, all tightly coupled – closely as well.
“We have an organization inside the company that evaluates emerging technologies, and there are very slim pickings in terms of network topologies in these very scalable systems. And the thing is, we’re not in the datacenter business. We need machines that can be run as one, which limits our choices—and that’s a very big problem for us.”
The interesting thing is that if IBM had kept its massively parallel, Power-based BlueGene system alive, this contract could very well have gone Big Blue’s way—and provided more competition for Cray. PGS has done a great deal of work with IBM over the last several years, but the feeling inside of PGS is that IBM has lost its roadmap for HPC—even if there is OpenPower on the horizon. Brandsberg-Dahl said that the BlueGene architecture sang with their code, but without a clear direction and follow-on system, PGS was left in the cold. OpenPower holds promise and PGS has some Power8 systems for testing, “but there are some price point and scalability issues there now,” he says.
The decision-making process for oil and gas giants like PGS goes well beyond hardware, however. The problem of modernizing codes to take advantage of all the leaps in hardware is one of the most pressing, as the Department of Energy has been very vocal about, especially with the purchase of its upcoming pre-exascale systems due between 2016 and 2018.
The architectural considerations are coupled with another “simple” but pervasive problem HPC users are encountering across the board—the code itself requires tuning to play well with all the potential improvements in performance. “This is really the crux, the conundrum,” Brandsberg-Dahl emphasized. “The people we employ are less and less sophisticated as computer scientists. They’re great mathematicians and geophysics, but they learned MATLAB and they live in that world—that’s a huge challenge for us.” He points to the generations that drove the real wave of HPC innovation in oil and gas as those that were in grad school in the 1980s who had to program at the low level, noting new requirements from the compiler and software tooling as a new critical element PGS and other companies with big HPC machines have to look at.
In fact, he said, one of the strengths of the Cray XC40 system is the software stack. “The more specialized the hardware gets the longer you can get your applications to run. Robustness and resiliency are important, and we have to keep porting and micro-optimizing for an ever-changing architecture set—it’s a lot to keep pace with for this newer generation.”
It will be interesting to see how these top supercomputers from Cray, SGI, and others will play out. And whether some like it or not, the stage where this plays out is the bi-annual Top 500 announcement. Even though PGS places no emphasis on the benchmark internally, Brandsberg-Dahl says PGS might run it—in part because it will be part of the acceptance testing, and also because some of the engineers in-house think it would be entertaining.
On that note though, he says that the new SGI ICE X cluster at Total, which has a peak theoretical performance of 6.7 petaflops, is not quite in the same league as a big Cray XC40 like the one PGS has.. “Since there is no way that peak can be achieved at that number for a single large application, it does not mean as much,” Brandsberg-Dahl says.
Either way, the next rankings will have some new additions, but once the new architectures start appearing in systems is when, agree with it or not, the Top 500 – which has remained rather stagnant in terms of big additions or trends –will get a new life.
Sign up to our Newsletter
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
The all-to-all interconnect, the new algorithms, the scalability of the architecture,the elimination of the I/O penalty, the calibre of the scientists/researchers, the availability of the system for extended periods and continuous operation, and the quality of software and support are all critically important….and the good news is….progress is still possible!