Seismic processing cloud infrastructure provider, DUG, has enough combined compute power to grace the leading ten systems on the Top 500 list of the world’s most powerful supercomputers, with around 30 petaflops for seismic processing, full waveform inversion, petrophysics, and other HPC applications in oil and gas via many of its own software packages.
Unlike national labs and universities that have to design systems to suit a wide array of workloads, DUG has a firm grasp on what it needs from hardware. The company put a huge fleet of Intel’s “Knights Landing” systems on the floor over the last several years, going to great lengths to optimize for the architecture, which is no longer on the table.
It is no surprise that DUG is digging for a new CPU option. The oil and gas industry could use near-infinite compute but the real bottleneck is memory bandwidth. “Algorithms like full waveform inversion were proposed in the ’80s. We could finally run them in the 2000s and in the 2010s computational complexity was still a limitation. Even though it’s been 30 years, we’re still limited by the amount of compute,” says Dr Stuart Midgley, systems architect at DUG.
On the architecture list for balanced floating point and memory bandwidth performance is Fujitsu’s Arm-based A64X processor. DUG has not made a sizable investment in the architecture yet but its early successes and challenges are telling. Midgley has eight nodes in a 2RU chassis running CentOS 8.2 against some of its core applications now and while these are still early days, some of the porting hiccups might be worth overcoming for at least some of the key codes.
Knights Landing had 450GB/sec but A64FX has about a TB/sec. The cores are about twice as powerful, even out of the box, Midgley says. The problem is, “out of the box” is not necessarily what DUG needs. Its codes are highly optimized for the selected architectures. Knights Landing was the “workhorse” but if it goes with A64X, it is going to take significant work on the code side.
When the DUG infrastructure team first experimented with A64X one of their software packages, DUG Wave, which is based on the seismic inversion Devito code, hit bumps in the road with the environment — especially for Python. “We had to patch Devito and Python and we couldn’t get autotuning to work so we had to do that manually. After a couple of days of work we could get it to run 20 percent faster than our Knights Landing where we’ve done a huge amount of optimization,” Midgley says.
He says that since that early experiment, the team has used the 2RU A64X mini system (48 cores per socket, single-socket system) to run 50 percent better out of the box and were up and running the benchmark in 30 minutes. For some of their codes, however, there were challenges.
For the Kirchhoff time migration bit of a large Java and C package, things went smoothly after integration work with some of Fujitsu’s performance libraries. The problem was it was slow. They simply weren’t getting vectorization. They later took the same code without AVX512 and got double the performance of Knights Landing.
DUG’s FFT-based deblend codes are still a work in progress. They had to do some porting to get an FFT library support on Arm and once they compiled the results were incorrect. “We do think we’ll have problems with this code, we have some running incorrectly and this benchmark needs more RAM than the system has,” Midgley explains. The solution is tacking on an NVMe drive to make up for the 32GB per node allotment on the A64X system.
Even though there are some bumps in the road so far, Midgley says 4–5 staff are using A64X and they forget they’re on Arm at all. They develop on x86 and copy it across. He adds that it’s great that Fujitsu is supporting LLVM as well since it makes it easier for his teams to support the broad array of environments DUG needs.
DUG will continue experimenting with its small setup. It has already added a multihost NIC card and network that is fully supported and can see out to 100GB/sec.
Midgley adds that DUG is already talking to Fujitsu about taking out the fans and adding immersion cooling once the benchmarking is over.