When you are always looking for what platform architecture will be mainstream, you have to look at what those on the bleeding edge are doing to see what the leading edge might do, which in turn tells you what everyone else might eventually do.
For certain kinds of data analytics and machine learning applications, we look to the hyperscalers and cloud builders for prognostications, and for certain kinds of simulation and modeling in traditional HPC, we look to the national labs, the weather centers, and the oil and gas supermajors to do our auguries. And we also see these advanced HPC centers moving more and more into the kinds of analytics and AI that the hyperscalers and cloud builders have crafted. And, funnily enough, more of the cloud builders are offering HPC-style computing and many of the hyperscalers have consumer electronic design businesses that require HPC.
This is a good thing, we think, and will make technologies and people more fluid across what used to be more isolated domains.
But that is not what we are going to talk about today as we consider the substantial supercomputer upgrade that Italian oil and gas giant Eni will be putting in the field, boosting its aggregate computing capacity to over 70 petaflops peak at double precision when it brings the HPC5 system online in its Green Data Center in early 2020. For a commercial institution, this is a tremendous amount of computing capacity, and the obvious question is what will Eni, and its peers that are making similarly large investments as far as we can tell, going to do with all of this compute? This is on par with one of the national labs in the United States, Europe, Japan, and China, after all.
After a lull in spending on supercomputers a few years back when oil was in the duldrums, Eni has been working steadily in building up its supercomputers in the past six years, and is very much focused on traditional HPC – in this case, the combination of seismic imaging and reservoir modeling. There are a few big changes a-foot, and that is perhaps triply so for a company whose logo is a six-legged, fire-breathing wolf.
Next year’s HPC5 system, the fifth in a series of systems that Eni has been building since 2013, will weigh in at 52 petaflops. That is the largest public disclosure of system compute capacity by an oil and gas supermajor to date, and it eclipses the 25 petaflops “Pangea III” hybrid CPU-GPU supercomputer that rival Total announced last year and formerly the largest industrial supercomputer that we know about. We are grateful for the insight that Total and Eni provide by talking about their systems as they evolve them. Just like we are to Google and Facebook and Microsoft, who talk about their hyperscale systems.
ExxonMobil, Royal Dutch Shell, Chevron Texaco, and ConocoPhillips doesn’t really talk about what they are doing, although ExxonMobil did talk to us back in 2017 about how it had scaled its reservoir modeling code across the “Blue Waters” supercomputer at the University of Illinois National Center for Supercomputing Applications. BP doesn’t say much, either, but occasionally it does talk about the configuration of its largest systems. We get some hints here and there at Saudi Aramco, and Total and Eni have been bragging about their HPC largesse in recent years. Eni and Total are a little different from the other supermajors in that they not only has HPC systems to drive their oil and gas business, but they are also major retail electric and gas utilities in Europe, too. We sometimes hear about the systems at Petrobras and Petronas, but China National and Gazprom are pretty secretive, too, when it comes to systems. This is analogous to what some of the hyperscalers and cloud builders do. Amazon Web Services is very secretive, but Alibaba, Tencent, and Baidu give us some glimpses here and there that we can learn from.
Eni is not afraid to spread its money around, and has used a mix of system vendors in the past six years – seven if you count next year’s HPC5 system – to get the machine that it wanted at the time. The HPOC5 machine is a cluster based on a hybrid compute architecture, which mixes CPUs and GPUs – in this case a pair of 24-core “Cascade Lake” Xeon 6252 Gold processors and four Nvidia Tesla V100 accelerators that hook to the CPU over the PCI-Express bus and to each other using the NVLink interconnect. This time around, Dell won the deal with its hyperscale-inspired PowerEdge C4140 servers, machines that were actually launched by Dell two years ago at the SC17 supercomputing conference and which were designed to be upgradeable to Cascade Lake processors. This look like normal 1U pizza box servers, except that there is room for two processors and four GPUs, and one of the reasons this is possible is because it uses a pair of M.2 flash sticks as boot devices and has no local storage other than those flash sticks. It’s no big deal since local storage is used only for scratch anyway, and most HPC systems have an external parallel file system for storing data. In this case, HPC5 will have a 15 PB parallel file system – what flavor was not divulged, but we di know that it has the capability to read and write data at 200 GB/sec speeds, which is pretty fast. Eni has chosen an InfiniBand interconnect from Mellanox Technologies for lashing together the nodes in the HPC5 system, as it has for the prior four machines, and in this case it will be a 200 Gb/sec HDR InfiniBand interconnect.
Before we get into what Eni is doing with the HPC5 machine alongside its predecessor, the HPC4 hybrid CPU-GPU system that was installed in 2018 and that will continue to be used as well, we did a bit of analysis of the five HPC systems that Eni has put into the Green Data Center since 2013 and found some interesting things. To get started, here are the feeds and speeds of the HPC systems at Eni:
And because memory bandwidth is probably more important to HPC workloads than raw compute, here is the network speed and the memory bandwidth of each system:
In the tables above, we have reckoned the peak double precision floating point performance of both the GPUs and the CPUs in each of the five HPC systems at Eni, and then calculated the total share of the compute that is delivered by the GPUs. That ordering is not backwards. In a way, and particularly with modern codes, it is probably best to think of the CPU as a serial coprocessor for the parallel GPU processor where the seismic imaging and reservoir modeling applications actually run.
We were particularly interested in how the ratio of GPU to CPU compute in the Eni systems has changed over time. Back with the 1,500 node HPC1 system that IBM built for Eni back in 2013, the system was based on IBM’s hyperscale-inspired iDataPlex DX360M4 nodes which had a pair of Intel “Sandy Bridge” Xeon E5-2670 processors, each with eight cores running at 2.6 GHz; the system was interconnected with a 56 Gb/sec FDR InfiniBand network. This was a perfectly normal 499.2 teraflops, all-CPU system for the time. But with GPUs on the rise in HPC and Eni in control of its own codes for seismic imaging, the company wanted to radically accelerate these seismic jobs by shifting a lot of the work to GPUs. So in 2014, with the addition of the HPC2 system, Eni had IBM add another 1,500 nodes to its supercomputer center, based on the iDataPlex DX360M4 nodes that had ten-core “Ivy Bridge” Xeon E5-2680 v2 processors running at 2.8 GHz, plus a pair of Nvidia “Kepler” Tesla KX20x GPU accelerators. (The iDataPlex nodes were too skinny for the much beefier Tesla K80 accelerators.) On that HPC2 system, those GPUs accounted for 85 percent of the 4.6 petaflops of the machine, and if you add HPC1 and HPC2 together, GPUs represented 77 percent of the total 5.1 petaflops that Eni had installed for HPC workloads in 2014.
As seismic processing was moved aggressively to GPUs in the next couple of years, Eni held steady with its systems during the oil downturn, and when things were looking a little brighter in 2017, the company decommissioned HPC1 and brought the HPC3 system online, which was built by Lenovo using the follow-on to the iDataPlex, called the NextScale nx360M5, it had acquired from IBM as part of its deal to buy the System x division. While the HPC3 system only had 375 nodes, the CPU part of each node had a pair of 18-core “Broadwell” Xeon E5-2697 v4 processors matched with a pair of Tesla K80 GPU co-processors. This machine was rated at 3.8 petaflops, and 88 percent of the compute (as gauged by double precision floating point) came from the GPUs. Not surprisingly, the memory bandwidth on both the CPUs and the GPUs had been scaling up along with the heat dissipated by the systems – nothing comes free. Eni also scaled up the InfiniBand to 100 Gb/sec EDR to better balance the bandwidth and compute in the box and the need to communicate in bigger chunks and at lower latency across the boxes.
A funny thing happened with the HPC4 system in 2018, which we did not notice at the time. It wasn’t switching from Lenovo to Hewlett Packard Enterprise as its system vendor, which we certainly did notice, but rather that Eni went with a near-top-bin 24-core “Skylake” Xeon SP-8160 Platinum processor on the CPU nodes to match them up against the pair of “Pascal” Tesla P100 accelerators from Nvidia. Across the 1,600 nodes in the HPC4 system, the GPUs represented 89 percent of the 18.6 peak flops at double precision and 89 percent of the 2,531 PB/sec of aggregate memory bandwidth across all of the CPUs and GPUs in the system.
With the move to HPC5 next year, the peak compute will rise by a factor of 2.8X to 52 petaflops compared to HPC4 and the aggregate memory bandwidth of the system will rise by a factor of 2.7X to 6,774 PB/sec. Keeping the memory bandwidth in lockstep with the compute is basically determined by that ratio on the raw Tesla V100 accelerators from Nvidia, which account for 98 percent of the flops and 94 percent of the memory bandwidth across the HPC5 system. You don’t have to do anything special to keep them in lockstep. Nvidia does it for you.
How you collect that GPU compute together and balance it against the CPU compute is a different matter. With the HPC5 system, Eni is moving away from top bin Skylake parts to middle bin Cascade Lake parts, which offer much better bang for the buck and, frankly, help it buy more Tesla V100 accelerators. The top bin Xeon SP processors are very pricey indeed compared to the compute they offer, and if you don’t need them, then there is no point in buying them – particularly when so much of the code is running on GPUs, as is the case at Eni after a decade of transformation.
Now for some interesting charts. This one plots out the system-level (rather than the datacenter level) peak performance in double precision flops for both the GPU and CPU parts of the systems over time:
Note: Our charts presume that Eni is not going to do an HPC6 system in 2021, but that may not be the case.
As you can see, there is a baby CPU-only supercomputer hidden inside of a GPU supercomputer starting with HPC2 and all the way through HPC5. You might have thought that the amount of compute in the CPUs would scale linearly, always being somewhere around 90 percent to 95 percent of the total, but that is not the case. For the HPC1, HPC2, and HPC3 systems, the CPU portion of the compute was roughly averaging a little north of a half petaflops, but with HPC4, it went up to around 1.4 petaflops and now with HPC5 it will boost a little bit to maybe 1.6 petaflops. But the one thing that Eni is doing with HPC5 is getting a lot more memory bandwidth across those CPUs. So in a sense, it traded a cheaper CPU to get a better ratio of compute to memory bandwidth on the CPU side; it kept the core count the same, the cores spin at the same clock, and with architectural improvements in the Cascade Lake design, should offer a tiny bit more floating point performance per clock. This chart below visualizes this:
That said, this CPU memory bandwidth is just completely and utterly dwarfed by the bandwidth across the HBM2 memory used in the Tesla V100 GPUs from Nvidia, which are delivering 900 GB/sec of peak bandwidth, which is on the order of 9X as much bandwidth per socket as the Cascade Lake processor can do.
IT is hard to visualize the big steps in compute and bandwidth that Eni has invested in during the past decade, so these charts will help. This one shows the peak double precision petaflops and the peak aggregate memory bandwidth, adding both the CPUs and the GPUs to the mix:
And this one shows the total HPC capacity across the two current systems in the Eni datacenter over time, which is perhaps a better measure of how much oomph the oil and gas supermajor has to deploy on applications:
It is hard to visualize 9.1 EB/sec of aggregate memory bandwidth. But somehow, it still doesn’t seem like enough. Systems might performance a whole lot better on real-world applications if we could get that memory bandwidth to be considerably higher, but thus far no one has figured out how to break through that memory wall, and certainly not with general purpose compute elements like CPUs and GPUs.
And now that brings us to why Eni needs all of this compute and memory bandwidth. To get a sense of this, we talked to Vincent Natoli, founder and chief executive officer of Stone Ridge Technology, which has developed a new generation of reservoir modeling software, called ECHELON, that was built from the ground up to run on GPUs. We talked to Natoli at length back in 2017 about GPU acceleration in the oil and gas industry, which was one of the more useful conversations we have had at The Next Platform.
Last May, Eni formed a co-development partnership with Stone Ridge to work on ECHELON together, and to get started the two companies loaded up ECHELON on the HPC4 system and ran 100,000 high resolution reservoir model simulation runs, each with tweaks to account for geological uncertainties in the seismic data culled by Eni from a deep water well described by 5.7 million cells. The simulation ran across the entire HPC4 system and took only 15 hours to run. Reservoir engineers typical do their models on CPU-only workstations, and they are lucky to get one simulation done in a few hours.
There are a couple of factors that are drive such large compute complexes in the oil and gas industry, and it is not just as simple as adding reservoir modeling workloads to the complex that is already doing seismic imagining.
For those who don’t know, seismic imaging involves sending some sort of shock wave through the Earth’s crust and then listening to the echoes across a wide area to basically build a sonar map of what is happening in the rocks to try to find oil and gas deposits. There is a lot of uncertainty in this process, which is why oil and gas companies are willing to spend so much on HPC. Reservoir modeling is about building a 3D model of an oil or gas deposit and then trying to figure out how best to extract the hydrocarbons out of the ground as more wells are added to a field and the field changes as portions of the reserves are depleted. They are two very different workloads, even if they are connected, and seismic imaging is much easier, which is why it was ported to GPUs first. Some of the supermajors have long-since developed their own reservoir modeling code, others use code from Schlumberger, Stone Ridge, or Rock Flow Dynamics.
“There are three reasons why reservoir simulation is the up and coming HPC application in the oil and gas industry,” explains Natoli. “First, the codes are now there running on GPUs, and previously, the codes did not scale that well so there was no point in it. Second, companies want to do bigger models and finer grained models with more cells. They want to capture more details in the geology and they want to do more advanced physics, and more physics means they need more processing cycles to get it done. So that is pushing demand for more compute. And finally, they want to do more ensembles. Ensembles are a big deal. People have realized that the idea that there is this one model that can represent the subsurface of the Earth is just mistaken. There is an ensemble of models that might represent the subsurface with different probabilities. Oil and gas companies want to run hundreds of models – or maybe thousands of models – and get statistically relevant information about production and planning. They have always wanted to do this, but they have been limited by the slow performance of codes.”
And that is why Eni’s aggregate HPC system performance charts are lifting up and to the right so strongly.
Authors note: For those who want a longer view of Eni’s HPC systems over time, there is a good review at this link, stretching back to the 1960s up through today.