Inside Exxon’s Effort to Scale Homegrown Codes, Keep Architectural Pace
February 21, 2017 Nicole Hemsoth
Many oil and gas exploration shops have invested many years and many more millions of dollars into homegrown codes, which is critical internally (competitiveness, specialization, etc.) but leaves gaps in the ability to quickly exploit new architectures that could lead to better performance and efficiency.
That tradeoff between architectural agility and continuing to scale a complex, in-house base of codes is one that many companies with HPC weigh—and as one might imagine, oil and gas giant, ExxonMobil is no different.
The company came to light last week with news that it scaled one of its mission-critical simulation codes on the Cray-built Blue Waters supercomputer, but the achievement left us wondering what is on the horizon for co-design challenges in oil and gas as system core counts, memory and networking capabilities, and overall scalability grow along the projected exascale timeline. In light of these questions, ExxonMobil’s manager for the reservoir function division in the company’s Upstream Research Company tells The Next Platform that his teams are constantly filtering the hardware options in light of the many application requirements teams across the company have.
“As is typical, we must have both hardware and software in mind, although we cannot afford to do full rewriter of something like reservoir simulation to accommodate a new chip or platform every 18 months,” Kuzan explains. The strategy then is to be modular, even if the efficiency suffers because of such a choice. “We structure our algorithms and computations to maximize parallelism, memory use, and parallel I/O; minimize communication overhead; use vector processing or other acceleration as possible; seek intelligent parallel partitioning of the computational domain and do this all in light of complex physics while requiring stability–it is a general use capability, not an expert tool.”
“To achieve strong scalability, keeping the problem size fixed while adding more and more cores exposes the aforementioned challenges—and achieving strong scalability is key to maximize efficient utilization of HPC compute resources.”
Teams at Exxon are watching the exascale capability transition ahead from both a code and hardware perspective as there are a wide range of applications that can benefit from a dramatic increase in compute power, assuming the codes can scale to meet the challenge. “We have a diverse portfolio of high performance computing application software for solving a variety of problems in different parts of our oil and gas business, such as exploration, field development and depletion planning, production operations, reservoir surveillance, and supply chain and logistics optimization,” Kuzan explains. “We continue to push the limits—from teraflops to petaflops and more—of the HPC system as we use these applications for solving complex problems.” He says there are some areas in the oil and gas pipeline, including seismic imaging, reservoir modeling, and supply chain optimization that will drive the company toward exascale computing resources.
Even with that push to scale codes and hardware efficiency to those levels, he says his teams do not view this as just a hardware or software issue—“we continue to drive innovations in our workflows and algorithms for the most efficient use of available HPC systems. We also emphasize fit for risk and fit for purpose models—just because we can build complex models or run on large compute resources doesn’t mean we need to do it for every decision or every problem. We look continuously for ways to enhance our HPC compute capabilities—that means looking at the latest hardware offerings from different vendors that span CPUs to GPUs and other accelerators.”
Scaling codes is one of the big bottlenecks that companies across the commercial HPC spectrum face, whether using commercial or in-house approaches. However, on the hardware side, there are still some remaining challenges that Kuzan cites, including the need for sufficient memory to match high core counts. “Continued advances in architecture that would make code less machine dependent for scalability and performance” would be helpful, he notes, adding that it could be an unrealistic demand. Ultimately, on the systems side, he says that Exxon teams evaluate systems on a value versus cost basis—“a subtle point is that a capability machine is something more important to us in oil and gas than straight heterogeneous capacity.”
Kuzan notes that his company has a rigorous internal process to help teams decide if commercial versus in-house or open source codes are the proper choice, but “it’s not always perfect as we don’t always know what’s on the drawing board outside ExxonMobil.” He does say that their strategy is to “invest in innovative technologies that fundamentally change the way work is done and has a dramatic impact on business decisions” and these are not offered by commercial software.”
One forthcoming transformation that might add to the HPC codesign mix for oil and gas is the integration of deep learning into simulation and modeling workloads. Here, GPUs could be more important for training and the company’s developers will have to find clever ways to integrate and keep scaling the next level of analysis into an existing, hard-won HPC workflow or be forced into running these workloads on completely separate clusters. Kuzan says they are weighing the options for how to architect this in, seeing a separate cluster for more advanced data science as more effective than taking over cycles on the supercomputing clusters. “We use many data science techniques today and are working on incorporating deep learning techniques for solving a variety of problems in different parts of or oil and gas business,” Kuzan says. “The connection to HPC is especially relevant because we often deal with dramatic uncertainty—from the price of oil to the subsurface uncertainty in our reservoirs—and often the ‘data’ are in the form of results from many, many scenarios generated via modeling.”
Kuzan contends it might not be as challenging as it sounds to work deep learning into these workflows because there is the ability to separate the straight-up simulation from the data science in a modular architecture. “There are efficiencies to be gained from integrating or refactoring, but we’d want to balance these with just how cumbersome the modular approach is.” He says that much of the data science the company does with HPC-spun data he wants to do on smaller machines—even on desktops for some workloads. In essence, the goal for now is to create the big data set on the HPC systems, do some culling and training in HPC, but keep an eye on migrating that capability to a smaller machine.
To put all of this into some perspective, consider the current scalability of native codes on leading-edge supercomputers. The code scalability feat from the news last week yielded in parallel simulation using 716,800 processors—the full capacity of the National Center for Supercomputing Application’s Cray XE6 machine to boost reservoir management workflows. Exxon said this capability represents the largest number of processor counts in the oil and gas industry to date and is one of the largest simulations reported by engineering disciplines, including aerospace and manufacturing. This highlights Kuzan’s emphasis on the capabilities an exascale-class system might eventually bring, but emboldens the call to action in oil and gas and other areas that code scalability efforts should begin now—and keep scaling.