Energy Giant Takes the AI Supercomputer Route

We have followed a fair number of system trends for the oil and gas industry over the years. This is one of a few areas where extreme scale supercomputing is the norm as companies refine seismic exploration opportunities.

While the systems and architectures might change in extreme scale computing, codes at the commercial HPC sites are not so quick to keep pace. This has meant that buying the latest and greatest Intel CPU has been how much of the majors have kept their seismic modeling edge for last several decades. It is hard to say that is going to change dramatically, but there are hints that the largest oil and gas exploration efforts could get an accelerated edge. GPU software work on complex legacy modeling and simulation codes is no simple task, but it is clear some of the majors are making strides, if not leaps.

In the hyper-competitive world of oil and gas exploration we know two things. First, if we hear about momentum with one key technology (in this case GPU work extensive enough that a production system was procured around such devices) then we know it’s old news–the company has likely been working with GPUs for several years. Second, we know that if Total is revving up with GPUs on a massive mission-critical system there’s something to this acceleration benefit for staid codes. And actually there’s a third thing. Perhaps AI in large-scale production environments is not as far off as we think (but it’s probably not even closed to cooked today).

These are just general trends and based only what can glean. These large HPC systems at the heart of the majors are often the subject of competitive secrecy. However, once in a while we are able to get a peek inside how some of the majors consider system building at scale and how new aspects of workloads drive change. That is the case today as we learned how French energy giant, Total, has made some shifts from an HPC system perspective.

Total has announced a massive upgrade with a 25 peak petaflop capable machine, dubbed Pangea III, outfitted with Power9 host processors, Nvidia Volta GPUs, with NVLink and EDR Infiniband (HDR was not available in time for the procurement), replicating the design of the top two most powerful systems on the planet, the Summit and Sierra supercomputers.

Pangea III is now #11 on the Top500 as complied this week. It is one of only three purely commercial listings in the top 50 and one of those spots is for another system at Total (the only is for Petroleum Geosevices with the Abel machine).

The nodes are based on the IBM Power AC922 node architecture with 6 GPUs per node and two IBM Power9 processors each. This system has around 570 nodes from our approximation (this info has not been released yet, this is just based on working the math backwards). This is a significant reduction in the number of nodes from the predecessor machine, which had 4,608 nodes. As one might imagine there is an energy consumption advantage as well. Total says Pangea III will consume 1.5MW, a vast improvement over the 4.5MW.

We estimate that in acess of 90% of that peak performance is being driven by the GPUs. However, remember that not all oil and gas codes have been ported and not all will take advantage of this acceleration. This is changing as more codes make their way to GPU and we expect it will be an ongoing project at Total and the other majors. Still, Pangea is a marked improvement in performance and efficiency with much of the jump coming from the Volta GPUs.

Total has been public in HPC since the beginning of its Pangea “line” of systems, which we first learned about in 2013 and which underwent a capacity upgrade in 2015. This was an all-CPU SGI (now part of HPE) machine capable originally capable of 2.3 peak petaflops and post-upgrade at 4.4 petaflops, which earned the system a slot on the top 25 rankings of the most powerful supercomputers and only of only a handful of commercial supercomputers that ran the HPL benchmark to make the top 50.

As we noted back in 2015 following the upgrade, Pangea I and the Pangea II upgrade both had the same number of Xeon cores, but what a difference two generations of Xeon chips makes. By moving to twelve-core Haswell Xeon E5s, Total could get 1.9X the raw peak double precision performance in a third fewer processor sockets, which meant one-third fewer nodes and racks. While the Haswell Xeon E5 had a 12 percent higher list price (no one knows what Total paid for the processors), at list price the processor costs drop by 25 percent thanks to the node count shrink and the total watts consumed by the chips (as gauged in thermal design point, or TDP, rating from Intel) dropped by 30 percent. The result is that the cost per petaflops, when measured at just the CPU level, dropped by 61 percent. This is a big price/performance jump.

It is likely that the Total system will do more than just GPU accelerated HPC. Summit, Sierra, and other similarly architected systems are being hailed as AI supercomputers, which can do double duty on AI training and inference as well as traditional applications.

In a conversation about the Total machine with IBM’s VP of Accelerated Systems, Dave Turek, we learned that AI is still in a kicking the tires phase in oil and gas but many conversations IBM has with oil and gas customers revolve around its potential. In addition to handling some of the pre- and post-processing for traditional reservoir simulation workloads, Turek says that there are other use cases emerging, including using GIS image data to help oil exploration target areas of geological interest more accurately. “These are entirely new areas emerging that might have simply been based on geological intuition in the past,” Turek tells us.

He also says IBM is seeing real traction in the coming year for this architecture at large oil and gas companies. “These are pieces of CORAL” he says, and IBM expects the AI supercomputer concept to keep finding a path into more diverse HPC sites in both commercial and academic spheres.

While it is not difficult to track vendor system and performance shares for all of supercomputing via the Top 500, getting a grasp on who ranks in key industries is far more challenging since relatively few of the largest companies report their results. We do watch trends and while in some areas, like weather forecasting for instance, Cray has pulled ahead in recent years, oil and gas is still a mixed-bag in terms of vendors. Then again, it has often been a relatively easy decision between the big box vendors since most oil and gas systems were built with just CPUs.

That is changing now and we expect that to continue with more seismic codes being pushed to GPU. The integration of AI into existing HPC workflows in this custom code-centric industry (million+ lines of code all developed over many years) will be slow as well, but it is on the horizon. Since avoiding wasted expense drilling in unyielding locations can make or break the fortunes of the majors, the competitive edge is in HPC. If AI yields higher accuracy in location identification or as part of seismic modeling efforts, the industry will make it a point to follow that path in a hurry.

As with all things HPC, things take time to catch on. Turek says the AI supercomputer is indeed capturing attention as more research and industrial areas see the opportunity, even if they are not quite sure how to value that opportunity or work it into the way things have always been done.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.