Applying Machine Learning At The Front End Of HPC

IBM and the other vendors who are bidding on the CORAL2 systems for the US Department of Energy can’t talk about those bids, which are in flight, and Big Blue and its partners in building the “Summit” supercomputer at Oak Ridge National Laboratory and “Sierra” at Lawrence Livermore National Laboratory – that would be Nvidia for GPUs and Mellanox Technologies for InfiniBand interconnect – are all about publicly focusing on the present, since these two machines are at the top of the flops charts now.

We know they are actually working hard to win the next deal for the exascale successors to these two machines, but when we had a chat at the SC18 supercomputer conference with Dave Turek, vice president of technical computing and OpenPower, we didn’t even bother to bring CORAL2 up. There were other interesting things to discuss.

But as an aside: We did talk to Turek about CORAL2 back in June at ISC18, just after the bids for the systems had been turned in to the Department of Energy, and he couldn’t say much then except that IBM should get credit for delivering Summit and Sierra more or less as planned and that this should mean a lot when it comes to the CORAL2 bids. But maybe it wouldn’t because with each generation of machines, the major labs have to do an architecture survey and take into account any new developments – or lack thereof – that could offer better performance, wider application support, lower prices, or any combination of the above. In a sense, it is always back to square one on these big systems deals, which is good for driving innovation but perhaps something to make the major suppliers a bit testy until they win the deals.

It seems inconceivable that the combination of the IBM Power10 chip and a future Nvidia GPU and possibly 400 Gb/sec NDR or 800 Gb/sec XDR InfiniBand won’t win the CORAL2 bids, but with Cray back in the game with its own Slingshot interconnect, there is a chance that it could win at least one of the three CORAL2 machines. If IBM doesn’t win at least one of these deals, that will take a lot of the development wind out of its Power10 sails, which is not a good thing considering that IBM has to find a new chip fab partner since Globalfoundries has abandoned its 7 nanometer chip manufacturing. (The odds favor it being Samsung, not Taiwan Semiconductor Manufacturing Corp.) Hewlett Packard Enterprise is probably in the exascale running, too, with some of the technologies it created for The Machine, and despite all of its woes, you just can’t count Intel out any more than you can ever count IBM out. If the US government had enough money – call it a cool $2 billion – it could build four exascale machines and have one more than China is planning and give each of these four a run at the money.

But again, that’s not what we talked about. We did, however, talk about some ways in which AI and HPC are being deployed together. This seems to be the hot topic of conversation these days, even if there is a bit of understandable reluctance on the part of HPC traditionalists.

Timothy Prickett Morgan: Summit is a poster child for some of the innovative work that is being done to bring AI to HPC and vice versa.

Dave Turek: Summit is the coming out party for us for how we conceptualize HPC transformation. At one level, it is the elimination of the barriers between HPC and analytics and AI and at another level it is how these things get incorporated and amalgamated to create value.

So a year ago, we were talking about making the AI frameworks available and we followed that up with visualization, and these are really general purpose approaches to data science. This year, what we are talking about are things much more directed at HPC. For us, the concept of HPC transformation is founded on two different approaches –intelligent simulation and cognitive discovery – that actually coalesce to a certain degree.

Intelligent simulation is the recognition of the fact that people will continue to do things the way that they have always done them in HPC. So we are going to represent physical and natural and behavioral phenomena with theory, based on and encapsulated in partial differential equations, in stochastic systems, and in linear solvers, and so on. This is a mathematical foundation built on algorithms. Intelligent simulation asks the question: How can we help people do that better?

There are many threads to this, and the one that we are showcasing is centered on the idea of applying Bayesian methods to classical ensembles of simulations to produce much more efficient results at much lower costs and at much higher fidelity than ensembles of simulations could achieve on their own. Bayesian methods answer this fundamental question: Given what I know now, what do I do next?

If you think about ensembles of simulations, which I conceptualize as fathoming a solution space of potentially enormous scale, it is a matter of crafting a path via the simulations that are run in the most parsimonious fashion to preserve the most information and to get to the end results. If you look at the possible set of simulations that anybody could run, well, nobody ever does that. And the way they choose not to do certain simulations is by experience. That is inferencing, in the language of AI, and that is going to be quite dependent on the individual, and people are subject to bias and know-how but they will come up with something because, practically speaking, you can’t not have an answer to how you reduce the scope of that problem. What we want to do is apply approaches to help people more systematically, and maybe more effectively, make those decisions.

A classic example is one in formulation that we have run, and this has been done with real customers with real data and real problems. In this particular case, the customer was in consumer products and was looking to explore phase-based changes in chemicals as they are combined – do they stay separate or do they mix, these kinds of parameters. If you look at that problem, which at one level was nothing more than trying to figure out the optimum mixture of water, methanol, toluene, and some other things. You could conjure up a nearly infinite solution space for this, but you have to cut it down in some way. But then you are left with the problem of what do you simulate even after you cut down the solution space, and we applied this intelligent simulation technique to this problem and we reduced the number of simulations that were going to be run relative to what the experts thought would need to be run by two-thirds. It got an optimal result and therefore in effect preserved all the information and it improved the fidelity of the answer by 10,000X.

TPM: Give me some insight in the technique used to do this.

Dave Turek: It is called Bayesian optimization, and it is not using neural networks. You define the simulation by a set of input parameters. It is analogous to doing wet chemistry. You mix things in certain proportions, you run the wet lab experiment, and you get a result. Then you think about what the result tells you it tells you something – that’s the Bayesian thing. Then the question is, what do you do next?

TPM: Do you add more or less of something. . . .

Dave Turek: Exactly. We set up a system where we were doing simulations on a cluster just for that purpose, and that produces a result that is represented by parameters, which are then put into a shared database with our system, which is a couple of servers, running this software that looks at it and does some machine learning at extraordinarily fast speed and then suggests to the simulation cluster the next experiment that needs to be run. This keeps running collectively and iteratively and continually learn from the parameters what the next simulation should be.

TPM: This sounds a bit like Newton’s Method of approximation on finding roots, although I admit to being quite a bit rusty on calculus. This is another example of calculations that converge on an answer from successively better inputs, I guess.

Dave Turek: This is what leads us to the dramatic reduction in the number of simulations, because the machine learning is orchestrating the process of making that choice. And it is a phenomenal result. This implementation of Bayesian optimization is architecturally simple, since it is just passing parameters back and forth through a shared database. It doesn’t change how the simulation cluster runs at all, it just gives it increasingly better sets of input parameters that can solve a problem. We haven’t given it a name formally yet, but internally we call it Boa, short for Bayesian Optimization Accelerator. We have Python, we have Anaconda, and now we have Boa, but marketing won’t let me use that name because it doesn’t precisely convey what it does. Well, it sort of does, actually.

TPM: That means it is a good name, then.

Dave Turek: We are debating on that, and we are definitely going to turn it into a product and we are demoing it now. It has been worked on for more than two years, so it is pretty mature.

What we are saying to customers who have clusters running simulations is that it is so hard to get capital expenditure budgets and get approvals and such. Instead of simulating more, simulate smarter by getting a couple of extra servers and load Boa and go.

TPM: Are there any kinds of simulation that Boa does not lend itself to?

Dave Turek: It is domain agnostic, but we have to represent something to the system that represents the objective of what you are trying to do with the simulations. So in my inelegant language – and the mathematicians will take me to task – you have to specify a descriptive or objective function of what you are trying to accomplish. Now, it turns out that for a domain expert in weather modeling, finite element, chemistry, it takes a couple of hours and that is what you give to the intelligent simulation system and it is in that context that it is interpreting the parameters that are flowing back and forth.

Rather than produce a bunch of elaborate whitepapers, we are going to develop a set of stub functions, by domain, that people can look at for clues on how to do this. It won’t take us long, and it will provide the clue as to where you begin this Bayesian optimization process. We have a team of mathematicians working on this in the United Kingdom and they have run this on pharmaceutical problems, where they have gotten improvements of 90 percent on the definition of problem.

TPM: This is not necessarily about allowing customers to build smaller HPC clusters, but about not wasting time on those clusters simulating phenomena that are not useful. And that frees that capacity back up to be useful for something else.

Dave Turek: You are creating capacity, and now you have it. You are broken free a little bit from the capital acquisition cycle.

We have also shown this tool working in concert with OpenFOAM for Formula 1 car design. The F1 teams are constrained in the amount of compute capacity they can use. Think about the utility of this tool in that context. I can have a cut-down cluster with this tool and compete against a fully loaded cluster from another team. If you want to get competitive advantage, this is the way to go.

So that is one thread, and it is only one of many ideas we are looking at here. When you think about improving simulation, we are working on a way to actually steer a simulation in real-time, which is the Holy Grail from 25 years ago. And we are also looking at AI techniques that roll over into cognitive discovery. I can make a simulation run well, but how do I know I have a good simulation? This is about working smarter, not faster.

With cognitive discovery, which will also be productized, was motivated by our concern about data and knowledge and how it comes into play with respect to deep learning and inferencing. It has been relatively commonplace that when people talk about data science, the get enamored with neural nets and inferencing, and then they realize that most of their time is going to be spent in prepping data.

The cognitive discovery effort is an amalgam of a number of different tools that we have been working on for more than two years, and it starts with a very fundamental premise: What do you do about the data?

The objective is to avoid the first principle model that is characteristic of classic HPC, and can you contemplate displacing the systems of partial differential equations used in everything from classic simulations to Monte Carlo in financial services and so on with a different approach all together. The database approach is an example, which were talked about during the summer at Oak Ridge National Laboratory when the initial applications for the Summit supercomputer were unveiled. They talked about the incorporation of machine learning to attack the workflows, but it was a very data-centered proposition.

We have looked at that, and we have observed the obvious: In most institutions data is in a chaotic or semi-chaotic state on their side of the firewall, and on the other side of the firewall is semi-chaotic as well. We have keyword search thanks to Google and others, but that is limited when it comes to HPC. For example, it is fairly hard to do a Google search on an equation or a chemical formula or a table or a spreadsheet or a PDF. The community of people in HPC have data in these and other formats, and they have a lot of stuff in notebooks lying around.

If you think back 25 years. The motivation for the ASCI program out of the US Department of Energy was to use simulation as a vehicle through which expertise could be preserved over time. Back then, they were worried about people retiring, dying, going to work in industry, whatever. And it was a problem they had because a lot of what they have is secret and not part of the public domain, but this is also a problem that exists in general for any enterprise. Employees come and they go, and what do they take with them?

TPM: Their brains – processing – and their memories – storage – and their connections to other people – networking.

Dave Turek: True. Sometimes they will leave behind a notebook and you need to figure out how to process this. So we built a tool called Corpus Creation Service that ingests all of these different forms of data, the whole nine yards, and automatically generate a knowledge graph from it, and it can be huge. It has natural language processing and automatic annotation, but the objective is to make the transition from data to knowledge. We set it up in such a way that you can do deep search on it – semantics-based search, with complicated questions involving complicated mathematics or scientific questions that will produce results based on this knowledge graph. The system provides documented evidence of the conclusions it draws.

That is a profoundly big step going forward. It brings order out of chaos in the scientific domain, it facilitates collaboration because I can take what is behind the firewall and link it to other things on the outside of the firewall.

The reason you do this, ultimately, is that you want to find out what you know and what you don’t know. In materials science alone, there are over 400,000 papers published each year. How do you keep up? A smart-aleck would say that they are an expert and they know their field and they know what to read and not to read.

TPM: This sounds an awful lot like the way the Watson machine that played Jeopardy worked. How much of this Corpus Creation Service is based on Watson?

Dave Turek: All of the technology we are talking about is new, and part of it will be incorporated into Watson.

There are a series of tools that sit on this that automatically creates a knowledge graph, facilities deep search, and automatically generate neural networks based on this data. The testing on those neural network models shows equal to or greater than the performance on any standard benchmark against any data scientist on this planet. There is a progressive nature to this that gets you doing machine learning and then inferencing as well.

Here is an example to make it clear. Take the oil and gas industry. The conventional view if HPC in regards to oil and gas is to first make all of the algorithms run faster – Reverse Time Migration, Kirchhoff Prestack Time Migration, Full Waveform Inversion, what have you. That has been the mantra for 30 years or longer. That process is itself contingent on somebody doing some physical process – setting off depth charges in the ocean or drilling a hole in the Earth.

The question is: How do you decide where to do that?

Well, let’s confine ourselves to the land for the moment. You have highly paid geoscientists go out and survey the land and make judgements, or in AI parlance they will make an inference based on their own experience. They have access to geological surveys from years gone by and they will read through these. We have taken these cognitive discovery tools and made it available to these geoscientists to help them do a better job of inferencing. It is an attack on a classic HPC problem, but we are going to the front end. If we can pick the right place to do the seismic sounding, then everything else downstream from that is going to benefit as well. But you have to start at the beginning.

We are doing this with an oil and gas major right now, and this is what we call seismic analogs. We have built a corpus of everything that can be found – seismic surveys around the world that are public – and they, in turn, complement that with their own seismic surveys that they have done privately. We bring it all together and the geoscientists can now use this tool and understand what is the analog to what they have seen in the field and actually make recommendations to drill or not drill.

Here is another case. There is a smallish company that makes ball bearings, and they end up in wind turbines. The way that contracts work in the wind industry, suppliers of key technologies are exposed to the risk of that wind turbine not working. They have to compensate the power suppliers for time and lost revenue when the turbine is not working. It is a pretty big annual financial risk. This particular company generates less than $1 billion, but way more than $100 million, to give you a sense of scope. They will run into quality problems from time to time with their ball bearings.

So the issue becomes, how do you discern what is wrong with a ball bearing? It turns out that they have a ton of data on this – six million pages of it, to be precise. And you know what form it is in? Hand-written documents, a file here, a spreadsheet there. It is all over the place. So if we take this data and ingest it into Corpus Creation Service, we get a better diagnostic earlier on for what is causing problems with a particular ball bearing. And in doing so, they mitigate their risk. It takes maybe a few weeks to put the data together. They think they are going to save somewhere between $20 million and $30 million a year by doing this.

The important, and last, thing is that both of these threads – intelligent simulation and cognitive discovery – have to be used side by side in most cases to be effective.

Applying Machine Learning At The Front End Of HPC

Sign up to our Newsletter

Be the first to comment

Leave a Reply Cancel reply

Sign up to our Newsletter

Related Articles

IBM Power10 Shreds Ice Lake Xeons For Transaction Processing

The Financial Longevity That Red Hat Gives IBM

An Architecture for Artificial Intelligence Storage

Be the first to comment

Leave a Reply Cancel reply