Thinking Through The Cognitive HPC Nexus With Big Blue
June 21, 2017 Timothy Prickett Morgan
There are plenty of things that the members of the high performance community do not agree on, there is a growing consensus that machine learning applications will at least in some way be part of the workflow at HPC centers that do traditional simulation and modeling.
Some HPC vendors think the HPC and AI systems have either already converged or will soon do so, and others think that the performance demands (both in terms of scale and in time to result) on both HPC and AI will necessitate radically different architectures and therefore distinct systems for these two workloads. IBM, which has a hand in HPC and AI as well as quantum and neuromorphic computing has a slightly more nuanced opinion about how these markets are co-existing if not completely converging. To understand IBM’s view of the upper echelon of computing better, we sat down with Dave Turek, vice president of technical computing OpenPower, to talk about how Power9 and the CORAL supercomputing deployments are going and what the future of HPC might be in a cognitive era with a mix of technologies in the toolbox.
Timothy Prickett Morgan: Big Blue has been very open about the Power processor roadmap, which has been extended out to at least 2020 or so with the Power10 processor and possible variants from OpenPower partners in the mix, too. Many years ago, when GPU computing in HPC was taking off and machine learning had not yet burst onto the scene, Nvidia was pretty forthcoming about its general plan and put stakes in the ground for the Maxwell, Pascal, and Volta generations of GPUs. There was some wiggle in there in terms of timing and features, but generally speaking Nvidia did what it said it was going to do.
One of the things that we did not see at the recent GPU Technical Conference is a roadmap from Nvidia that outlined the companion future generations of its GPUs, and obviously we presume there are future GPUs beyond the just-announced “Volta” chip that will help hybrid supercomputers reach closer to the exascale floating point performance range. Those of us outside of the companies do not have as much visibility into the future as we did a few years back. And even for Power, 2020 is not that far away, really.
Dave Turek: That’s because we are at the end of the technology. [Laughter] And that is why we have espoused the notion that it is all about systems design from this point forward and to be as comprehensive as possible so you can pull on as many levers as you can at the same time and not just have it degenerate into talk about microprocessors. And the things that we have outlined in the past are coming to fruition.
So, for example, the data centric computing architecture that we talked about has as one of its components this notion of removing the friction of data movement as well as minimizing data movement, and if you look at the emergence of NVLink and the emergence of CAPI and then OpenCAPI, those are implementations that remove that friction. The second thing that has come to fruition is that programmability becomes important. If you look ahead to the CORAL systems, “Summit” and “Sierra,” later this year, we will have memory coherence between the CPUs and the GPUs and they are effectively coprocessors to one another. But these are related features, since we never got tied into a world of just one kind of accelerator. There has to be an opportunity to use diverse kinds of accelerators. A lot of people are fooling around with FPGAs, and this will all continue to evolve over the course of time.
TPM: No one else is going to have this coherence between the CPU and the GPU, although AMD might be able to do it now that it is delivering both Epyc processors and Radeon Instinct GPU accelerators. It certain has done some work on coherency in the past with its APUs and that technology could be applied to CPUs and discrete GPUs not on a single die or in a single package. But for now, only Power9 has NVLink and Intel is not offering something that looks like NVLink between its Xeons or Xeon Phis and other types of accelerators. The point is, no one has this yet except IBM and Nvidia.
Dave Turek: Correct. I was down in an exascale conference a couple of weeks ago in Princeton, and Jack Wells, one of the people at Oak Ridge National Laboratories, got up and was talking about progress on the CORAL systems and said that one of the biggest things is going to be this coherency. Forget flops and all of that, the big deal is the coherency. This is a bit under the radar because people are only talking about the technical features of Summit and Sierra machines and they have not realized the impact of this coherence.
The third thing that is important, which is also slipping under the radar right now and which is also apropos relative to the end of conventional chip process scaling, is the explicit inclusion of cognitive computing into the system as we go forward. We have done a lot of experiments with customers in the world now with regard to cognitive, and for us it intersects with HPC in at least two ways. One, if you think about the classical Watson question-answer system, it becomes a vehicle by which you use cognitive to actually help design simulations and models because you have literature to review and it becomes a modeling assistant, if you will. The second way, which is more machine learning and deep learning, is more the orchestration of ensembles of simulations through time. And what have found here is that we can get tremendous improvements in terms of time to completion, preservation of information, and increasing the fidelity of the answers. So over time, this becomes a much more inexpensive way to really begin to tackle these technology limits by actually reducing the amount of computing we need to do because the way we do it is a bit more insightful. We have seen reductions of two-thirds in the number of compute cycles with this approach. We call this cognitive discovery, and we have been run tests with customers on three continents so far.
TPM: What kinds of workloads have been tested with this hybrid AI-HPC approach?
Dave Turek: We have looked at materials, oil and gas, and consumer products. Now, consumer products sometimes gets trivialized, such as figuring out what a tube of toothpaste is going to look like. But we all know that there is chemistry in this stuff, and you have to figure out how to make toothpaste flow, and it turns out to be a laborious undertaking in wetlab chemistry. Now, all of that can be avoided. You do a simulation, which is a step forward, but then you apply cognitive and you can take that to the next level.
So, I think what you will see is the beginning of organizations starting to leverage all of these pieces that we have been talking about for quite some time. You have coherence for easier programming, and a proliferation of different accelerators tied to different needs. We have done a lot with neuromorphic and there are programs in place at many of the national labs in the United States and it has not risen to the state yet of the kind of work that we have done with quantum computing, which is open and on the web to build an ecosystem. But that first foray into neuromorphic is building a mini-ecosystem with keenly interested parties before it will blossom out. The point is, the groundwork has been laid to bring all of these other technologies into play, too, and we will continue down these paths pretty dramatically.
At the same time, we are revisiting old ideas to see if they have merit going forward. Where we are today with the CORAL and similar systems is really a byproduct of our experience with the “Roadrunner” hybrid system and the ultrascalability of the BlueGene system. But then you look at memory bandwidth and the role it plays, that is why we are doing these other things. This is not just being spun up out of whole cloth; it actually has a historical context. If you talk to Jensen Huang at Nvidia, his view of HPC is substantially colored by the Roadrunner project.
TPM: We have been telling people that story for a long time. You got the ball rolling at the same time that a bunch of Stanford University researchers were noodling around with offloading some parallel routines to GPUs even before the CUDA programming environment was invented. It just turns out that the GPU was a cheaper compute offload engine.
Dave Turek: [Laughter] As you well know, we actually designed our own, but what many people don’t know is that we designed one that was way beyond the Cell game console processors, but we decided that the time and the cost of bringing it to market was just going to be too extreme.
TPM: Hence the seeming inevitability of the GPU accelerator. But the irony, of course, is that the Tesla accelerators and the variant of the GPUs that are used for AI and HPC, and this is especially true of Volta, might as well be a custom part given the innovation. The difference now is that the datacenter is getting the technology first, and it trickles down to the desktop where appropriate. This is the natural order of things, as far as we are concerned.
Dave Turek: Everybody is grounded in where they came from but they are reacting to where they are heading. You start with desktop GPUs and then you design to intercept HPC, AI, and other markets.
We think this is starting to coalesce. The work that we are doing on quantum computing is being done with commercial interests as well. Our quantum designs are different from the annealing kinds of designs, and we do a universal quantum computer because we want to do quantum chemistry, we don’t just want to do optimization in the classical sense. It is all about building the ecosystem, and one thread you see is our focus on building ecosystems through partnerships – OpenPower was first, OpenCAPI followed that, the way we have done quantum, the embryonic efforts around neuromorphic. We understand that there has to be participation from the community in setting the parameters of what is actually getting built. So we have taken co-design experience that we have had from building systems with the US Department of Energy for many years and have elevated this and deployed it at large. It is a radical departure from the rest of the industry, which is doing stuff in secret and then revealing what they have.
TPM: Is all of this effort actually driving revenues? The desire here is to drive some business by helping people solve real problems, not just do science experiments. Business is, after all, your middle name. . . .
Dave Turek: We have seen many customers purposefully delay their RFPs to intercept these technologies. That is in the process of generating a lot of business for us, and a lot of those RFPs are in flight now in the anticipation of this coming forward.
TPM: Did you get lucky?
Dave Turek: No. [Laughter]
TPM: Let’s see. The timing has certainly been good. The CORAL machines, which were designed and requisitioned so long ago and before the hyperscalers even knew they needed something like this to run AI workloads, are coming to market just as the same hyperscalers need this big node iron and HPC applications need it too and, moreover, HPC might need to weave in AI to stretch the flops further with neural network ensembled HPC simulations. And then, given this, everyone else may need such computing too, whether they buy or rent it.
Dave Turek: I would say this. Our designs for exascale starting in 2011 were predicated on the need for accelerators, and that time it was not going to be Nvidia GPUs but our own processor. Our interception with Nvidia Teslas was not predicated on machine learning and deep learning being the next big thing. We looked at it more from a classical HPC angle, and we didn’t see how following the conventional path with more processors and more nodes would provide the kinds of solutions that people needed.
TPM: That’s my point. You did what HPC needed and it mapped perfectly to what the machine learning crowd didn’t even know they needed yet. [Laughter]
Dave Turek: Exactly. OK, so maybe you do get lucky in that respect.
TPM: Well, it has tripled the business, which is a beautiful thing and which makes highly tuned GPUs for both HPC and AI economically practical. And we are on the cusp of GPU accelerated databases adding to the revenue flow. Now, HPC can ride the coattails of AI for a while, and maybe AI can learn a lesson or two about MPI and scale from HPC.
Dave Turek: There are many times when things like this happen. I remember back in the old days that we did designs for web technology in our System x division with its iDataplex systems, and it turns out that instead of intersecting Web 2.0 we sold a lot of machines to HPC centers. The initial design was not HPC.
I think there is a technology thrust going forward, and then there is a market and opportunities perspective which is evolving at the same time in unpredictable kinds of ways. We are doing an announcement this week that is indicative of our luck, as you say, to showcase an opportunity in a segment that we have not been very active in for a number of years, and this is a partnership with the National Center for Atmospheric Research to develop a new community model for weather. We think we have the technical depth of the Weather Company, and we have an idea about how to create value beyond just shipping a machine to run models faster. We are not going to get into details right now because we don’t want to give our competitors an idea of what we are up to.
TPM: What other markets can you chase?
Dave Turek: It is interesting. The imminent arrival of Power9 has opened doors that have been closed for a while. We have seen this particularly in the oil and gas sector showing the efficacy of GPUs, and we have projects with three of the five big majors – I can’t say which ones – right now, and that is using Power8 hardware ahead of when we have Power9 test hardware, which is in a handful of weeks. We are going to ship Power9 to the labs with Summit and Sierra first.
TPM: I have been told to expect a Power9 announcement sometime in the second half of this year, and that AIX and IBM i customers should not expect to see announcements for systems from IBM until early in 2018. I presume that Linux systems from IBM will be announced and available before then but after Summit and Sierra get some nodes.
Dave Turek: The issue with Linux is support for coherence. We will have a release of Linux that will support coherence, but it will not be a general release, but enough to handle a handful of customers at the tail-end of this year. And then this coherence support for Linux will be generally available in the early part of next year. The issue is tweaking the kernel to accommodate this memory coherence.
TPM: Can you or should you stretch the coherence beyond a single node. It won’t be long before we have 200 Gb/sec InfiniBand, and then 200 Gb/sec Ethernet will follow, and within two years or so, and maybe earlier from upstart vendors like Innovium, we can get to 400 Gb/sec interconnects. This is a lot of bandwidth, and in theory that could be used to spread coherence across nodes and further simply the programming model. Or do you just do MPI and keep all the tight memory coherence in hardware and in one operating system kernel? Would cross-node coherence over InfiniBand or Ethernet just be impossibly hard or stupid?
Dave Turek: It is neither impossibly hard nor necessarily stupid, but there is always this tradeoff in terms of performance and what that manifests itself as, and then you have to consider the volume of work as well.
The other thing I want to stress is that you can absolutely count on the fact that we are on time with CORAL. It doesn’t mean that tomorrow something disastrous doesn’t happen. But I will tell you this: The testing that has gone on to this point at both IBM and Nvidia? No problems. Zero problems. The thing that we were most scared about was coherence, and there have been no problems with that, and that testing has been going on for months now.