Although oil and gas software giant, Baker Hughes, is not in the business of high performance computing, the software it creates for the world’s leading oil and gas companies requires supercomputing capabilities for some use cases and increasingly, these systems can serve double-duty for emerging deep learning workloads.
The HPC requirements make sense for an industry awash in hundreds of petabytes each year in sensor and equipment data and many terabytes per day for seismic and discovery simulations and the deep learning angle is becoming the next best way of building meaning out of so many bytes.
In effort to bring deep learning into the HPC and traditional analytics based oil and gas market, Baker Hughes is using an InfiniBand connected cluster of Pascal GPU-based Nvidia DGX-1 appliances to support a new wave of software products for the oil and gas industry based on deep learning. The use cases where neural networks fit are broadening, touching everything from overall rig optimization to more traditional HPC simulation based areas in resource discovery and production.
Arun Subramaniyan, VP of Data Science and Analytics for Baker Hughes Digital and GE, which is a majority stakeholder of the oil and gas software company, says that the company’s approach to integrating deep learning into the complex packages they sell to the world’s top oil and gas suppliers will be a hybrid between HPC and AI techniques. “Most of our models are a combination of physics augmented by deep learning or AI that requires input from physics based models. For example, for some of the large-scale multiphase simulations the oil and gas industry has been running for decades or larger turbulence models can all benefit from parts of that workflow replaced by neural networks.”
“We are not necessarily replacing the physics models themselves with neural networks. These models are an important part of the overall oil and gas ecosystem, but there are many pieces of those workloads that can be accelerated through a combination of deep learning and physics models running together, Subramaniyan says, pointing to the promise of generative adversarial neural networks (GANNs) to potentially overcome the limitations of deep learning in physics-based HPC simulations in the future.
While GANNs are on the horizon for more complex initiatives, Baker Hughes is using its DGX-1 cluster for more standard convolutional (CNN) and recurrent neural networks (RNNs) to support new and updated software packages that can take the terabytes of daily sensor and other data streams and turn them into meaningful insight about potential repairs or problems, safety or environmental concerns, and more specific problems like corrosion, which is an issue in moving oil through carrier infrastructure. More traditional HPC problems like oil and gas discovery and extraction have other software requirements but could benefit from the addition of deep learning with the kind of AI and physics-based simulation integration on the horizon.
The complicated part of all of this from Baker Hughes side was initially in deployment, something that took teams inside the oil and gas software company a good deal of time to work through. The company trains the complex neural networks on its DGX-1 clusters then pushes that over the cloud as an inferencing service to its clients with retrained updates pushed through when required. The next step is to be able to present fully trained baseline models to clients who can then train based on their own data in a cloud environment. All of this is the final model of operation—but going from concept to deployment at scale was the big initial technical hurdle.
“Today, it is a blessing a curse that people can go build fairly sophisticated models with openly available code on cloud infrastructure. That is the easy part. The big issue is how to take that and roll it into a production workflow and manage the lifecycle of the whole thing,” Subramaniyan explains. “It took a lot of work to make it easier internally to build, deploy, and manage hundreds of thousands of analytics operations at scale deployed across systems connected to datastreams with 14,000 or more data elements tied to them in production—and failure of any part of this workflow is highly consequential.”
So, the question is how did the company go from the original research models to full-scale deployment across a DGX-1 training cluster tied to HPC infrastructure, all of which branches out to the cloud for inference and delivery?
Deploying the AI at scale meant building a lot of the deployment framework and infrastructure themselves with the cloud and Kubernetes as the base on a mesh network of the team’s own devising that could run on-demand at runtime with HA and the ability to deploy across multiple regions at the same time to mitigate failures.
Just as an example of the complexity, Subramaniyan says that initially, one problem was that they had a model built in TensorFlow, another in Caffe, another in PyTorch with each needing to run in inferencing mode in the same overall flow. “Just managing loading, make all systems were up and running when we were hitting them, and getting the predictions to work was a massive task.”
While it took Baker Hughes some major investment in talent to get the right teams together to deploy AI at scale, the ROI is that oil and gas companies might be spared from having to roll their own in-house software to do some of the AI framework wrangling that leads to better operations. Like oil and gas companies who often have to reiterate that they do not want to be business of infrastructure management, Baker Hughes has had to quickly ramp up expertise—but the benefits get passed through the industry pipeline.
We asked if the addition of AI into the HPC dominated areas of oil and gas might change the way the industry thinks about investing in HPC hardware, perhaps pushing toward greater adoption of GPUs or even separated networked clusters for AI versus traditional simulations. “No matter what, you have to think about this at a hybrid level. If you’re combining physics simulations with AI you need latency reduced between the two as much as possible. You need dense GPU systems combined with powerful CPU racks which is exactly what we are building. Our biggest driver is to get to production optimization and better operations and process management, which also requires a GPU plus CPU mix. There is a lot of data pre- and post-processing with Spark runtimes that run close to our GPU workloads, for instance, where this makes sense.”
Subramaniyan says that the forthcoming Volta GPUs, which will also be packed into an appliance ala the DGX-1, will be worth their weight in gold—something that is fair to ask after since the machines are notoriously expensive. He says the performance jump compared to Pascal is a heady enough to necessitate any investment in new boxes.
As Binu Mathew, VP of digital development at Baker Hughes, GE concludes, “the oil and gas industry needs exaflop and beyond levels of performance in the coming years. That is just based on what is coming from sensors and doesn’t including the area of imaging which is also exploding.” He says this, coupled with cloud based GPU capabilities will enable his company to push ahead and deliver entirely new ways to operating to the oil and gas industry now and in years ahead.