Behind The UK’s Bleeding Edge Supercomputing Plan

Just as the fervor died down around the massive deals for forthcoming pre-exascale supercomputers in the United States following the CORAL procurements (most recently, with the announcement of Aurora—the only one of three such HPC deals that is not betting the future on IBM OpenPower systems) the supercomputing spark was stoked again, this time, from across the pond.

Out of the fastest five hundred supercomputers on the planet, the United Kingdom currently has 30 machines that make the bi-annual list, including its most powerful machine, a Cray XC30 called “Archer” housed at the University of Edinburgh. Archer is followed closely performance-wise by another Cray system for ECMWF, a weather forecasting and research center, and yet another at Daresbury Lab, which is a BlueGene supercomputer—soon to be the site of the upcoming machines in question.

The Science and Technology Facilities Council (STFC), the UK government sponsored research organization that works with both national industrial and research partners on systems and software projects and that funds the Daresbury Lab, announced a £313 million investment to support a grand OpenPower undertaking that will bring three new systems (in stages, as with the CORAL machines in the U.S.) into play, but just as important, will push new architectural and application-level developments—everything from new ways to harness FPGAs, ARM architectures, and deep learning, cognitive computing and other Watson-driven projects.

In this era of data stealing the floating point calculations show, the emphasis for recent big procurements is being placed on what IBM is calling data-centric computing. Accordingly, as we have seen with some of the biggest system deals to date, including the Summit supercomputer coming to Oak Ridge National Lab, and the Sierra national lab system, the key elements of the machines are focused on moving data more than sheer number crunching. It is the successor to their supercomputer push in the post-Blue Gene world, and one that has caught on for some very big upcoming machines that will rely on the combined force of future generation IBM Power architectures (Power 8, 8+, and 9, as roadmaps indicate), Nvidia GPU coprocessors, Mellanox networks and of course, NVLink and CAPI to hook and tie the elements.

The Next Platform was able to get more details about how this system deal will shape up during a discussion with Cliff Brereton, director of the Hartree Centre, who told us that indeed, this is a three-system deal culminating in a 2018 finale that will be far higher performing than anything the lab currently have (which is a tick over a petaflop) but nothing on the order of the 150 petaflop machines that will be installed at national labs in the United States. The differentiating factor in this machine is that the leadup to the 2018 system will be used to test out new architectures and approaches to computing, some that Brereton says are not generally available but could a way into eventual IBM product lines. This includes everything from finding ways to make use of the hooks OpenPower has on the line for FPGAs (a topic the team at his center are already engaged with on an existing Maxeler test cluster) to far more exploration of how ARM will fit into the big picture of large-scale computing.

And as mentioned above, the workloads that will be addressed by the OpenPower machines will also be different from what some of large research and scientific projects will be for Summit and Sierra. Since the center provides computing assistance up and down the stack to many UK-based industries as well as for social projects, the architectures and application work that happens – with a great deal of on-site staff from IBM, who will now take residence at the center as part of that multi-million package over five years – will fine-tune existing industry applications for the coming machines. But just as important, as Brereton explains, they will be “taking on areas that are completely new in computer science in the realm of data science and cognitive computing.”

“If you think about a place like the Watson center in the United States, it is that type of environment we will have here with completely new ideas and applications we will explore, much of it that has not appeared yet.”

One of the big priorities (perhaps not surprising given the UK roots) is much greater exploration of ARM. As we have written about in the past, the center has been keen for some time on showcasing the validity of ARM for demanding HPC and data-intensive computing, but this new focus will push their OpenPower limits—although it’s not clear how this ties in with the Power8 and Power9 architectures in the forthcoming machines—we’ll get more information on that in a dedicated article about this in the near future.

At any rate, Brereton says, architecturally speaking, they are “placing the bets. It’s a time of incredible change. The architectures of the last fifteen or more years are clearly not going to take us past the next several years, so there needs to be a step change and there are lots of options but for the workloads we’re looking at ahead, we are confident about this architecture.”

On the OpenPower front, Brereton says what has the team really looking ahead is the coming addition of both NVLink and CAPI in coming systems. He says that although he sees opportunities in the Intel arena to solve the same data movement problems at silicon level, their partnership with IBM runs deep (although it should be noted the center is also an Intel Parallel Computing Center with staff there on that side of the house).

In a chat to explore more about what architectural differences might be found in these UK systems, which follow the data-centric computing path along the lines of Summit and Sierra, IBM’s general manager of OpenPower Alliances, Ken King, gave The Next Platform got a slightly closer look at where the Daresbury folks are heading with their roadmap, along with some previously unknown codenames for future generation Power8+ and Power9 nodes that will be found on the Summit and Sierra machines (ostensibly the two unnamed architectural leaps found in the roadmap below from when we talked about OpenPower a few months back).

openpower-roadmap

“What will be in place is going to be very similar to Summit and Sierra from an architectural perspective, incorporating the same elements on the Firestone, Garrison, and Witherspoon nodes we’ll be delivering for those CORAL machines,” said King, which sheds light on names, at the very least, to begin referring to those future system designs following Firestone and featuring CUDA 8 and CUDA 9 (it’s not clear which name is matched to specific upcoming nodes).

Of these nodes, King says that these systems, all he could share is that they would all three include flash storage, Mellanox InfiniBand, Nvidia GPUs, and Power processors through the generations. “We are leveraging and reusing what we’re doing, so beginning in the 2015 up to 2018 timeframe, but the system grows significantly node-wise through that period.”

King continued, “It will be a very powerful system, I can’t compare it to Summit and Sierra, but do what is needed for the cognitive computing solutions running on top of it. It’s going to be leveraging Watson and other new data analytics capabilities for life sciences, oncology, and other business workloads. The ability to get that insight does take a powerful machine.”

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

2 Comments

  1. I see a huge problem with the current “pre-exascale” systems in the US and Europe. Theyre pretty much all CPU GPU hybrids. The Knights Hill based Aurora sounds promising.

    In Japan there are already several supercomputers up and running with SPARC XIfx CPUs with 3D HMC RAM and Fujitsu’s Primehpc FX100 architecture which efficiently scales beyond 100 PFLOPS 3.8 PB of RAM with 53PB/s of memory bandwidth and 11PB/s interconnect bandwidth.

    JAXA’s SORA JSS2 and Riken’s Great Wave(to become Big Waterfall) and even the Earth Simulator 3 are all up and running and no one in the west seems to have noticed and theyre far more advanced than these CPU GPU toys.

    They ALREADY use next generation technologies and are currently deployed. Looks like Japan will be the first to the ExaFLOP and their architecture for K was 93% computationally efficient at its maximum scale vs 65% for Cray’s. I think the Fujitsu is currently 97% or something ridiculous.

    It might be time to rethink all the GPU hybrids.

  2. I probably should have also mentioned that supercomputing shouldn’t just be about running HPL and seeing how high of a score you can get.

    The K Computer is ancient now and its still number 2 on the Graph500 because of its clever interconnect(which is replaced with a mostly optical one in the FX100). NEC has also been working on new optical interconnects.

    Being able to do useful work on a supercomputer is definitely dependent on more than just FLOPS. Its now largely memory and interconnect bandwidth dependent.

    Hopefully Intel can get Knights Hill to communicate mostly optically and extremely efficiently if they want to get thatbarchitecture to exascale. Im not sure how useful 384GB of far memory for supercomputing is, if it cuts into a power budget that’s already going to be VERY limited.

    It might be better spent on having 32 or 64GB of HMC as far memory and keeping the MCDRAM as is. That would be close to 1TB/s combined bandwiths, or almost double the MCDRAM/DDR4 combo.

    I don’t know if Nvidia is going to be able to get anywhere near exascale if they don’t integrate silicon photonics into whatever comes after GV100.

    I could be very wrong, but their architecture seems more suited to ultra high end workstations and smaller hybrid systems that actually need to run diverse workloads, with some running better on CPUs and some running better on GPUs.

    Thats probably why the US is getting both Knight’s and Nvidia supercomputers as well. Its been a pretty boring time in all computer hardware spaces for a couple years now. 2016-2020 should be VERY interesting.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.