Designing the SKA Supercomputer Platform

When the Square Kilometer Array (SKA) is deployed in the next decade, along side it will be a supercomputer tasked to process the deluge of astronomical data that the instrument will collect from the skies. And even though the SKA will be the world’s most powerful radio telescope, it won’t quite need the world’s most powerful supercomputer to fulfill its mission.

One reason for that is that the SKA supercomputer, known as the Science Data Processor, or SDP, won’t be a typical HPC machine, inasmuch as its primary application set will be real-time data analytics rather than digital simulations.  More significantly, the SDP is to be built as an integral component of the SKA instrument – the telescope-supercomputer complex – where both are under the control of telescope management software. In that sense, instrument can be thought of as an extreme-scale appliance that integrates a supercomputer with a radio telescope.

That said, because the SDP will have to crunch nearly a terabyte of data per second streamed from the radio telescope, it will still need to provide a sizable number of flops.  According to Maurizio Miccolis, the SDP’s project lead, the system is expected to deliver 250 petaflops of performance, which, if installed today, would make it slightly more floppy than Summit, the world’s current number one system. But when the SDP is expected to be deployed in the mid-2020s, its supercomputing peers will have already surpassed the exaflop mark. Even the recently announced Frontier system that is scheduled for deployment in 2021, will have six times the computational heft as the SKA supercomputer.

The SDP consortium, whose lead member is the University of Cambridge, recently completed their design review of the system in anticipation of that future deployment.  And even though the basic architectural elements have been decided upon, the specific hardware implementation and the choice of vendors remains to be determined.

Actually, there will be two deployments of the platform, one in Cape Town, South Africa and the other in Perth, Australia. The Cape Town installation will be used to process data from Low Frequency Aperture Array, while Perth supercomputer will be devoted to the Mid-Dish Array. Although the telescope arrays are functionally different, they require basically the same type of computing system since they have the same mission: ingest observational data in real-time, extract the salient information from it, and send that pre-digested data to regional centers or store it locally for further analysis – or both.

According to Miles Deegan, the HPC and data analysis specialist for the SKA project, while the SDP resembles a supercomputer as far at the underlying technologies are concerned, its design differs from that of a typical HPC machine. “Architecturally, it’s a mix of HPC, big data analytics, and cloud technologies” he tells us.

As a result of the emphasis on data analysis work (including graph analysis), rather that scientific simulations, the SDP will be built more like a dataflow machine than a general-purpose HPC system. That led the designers to construct something that has more architectural diversity than say a Summit or a Tianhe-2 supercomputer.

“The operational model is evolving, but it’s unlikely to be one big monolithic system where each server is essentially identical,” explains Deegan. “We’re going to run different types of workloads and pipelines, so we’ll have some degree of heterogeneity, not just within servers but across servers.”

At the processor level, that heterogeneity will be expressed in the form of CPUs and accelerators of some kind – possibly GPUs, but perhaps FPGAs. FPGAs are a particularly interesting option, since they have proved to be a good choice for the kind real-time analytics and data reduction that the SDP will be engaged in. In any case, it’s unlikely that all SDP servers will be equipped with accelerators, since not all the SDP applications will benefit from them. Likewise, some servers will probably have more memory or more local storage than others.

Since most of the SKA work will be data-bound, the system will need a lot of I/O and memory bandwidth, implying the system will rely heavily on multiple tiers of storage, including burst buffers, as well as multiple tiers of memory. Each of the two telescope arrays is expected to produce about 300 PB per year, over a lifespan of approximately 50 years. Obviously, that means that not every bit that falls out of the sky will be saved. In fact, some data may be stored for as little as six hours.

The latest prototype SDP system incorporates many of these elements, including standard compute nodes, accelerator nodes, high memory compute nodes, storage nodes, and OpenStack controller nodes. The prototype is constructed from Dell PowerEdge servers and incorporates a standard array of HPC compute, storage and networking products. These encompass Intel Xeon CPUs, Nvidia GPUs, EDR InfiniBand, and 25G Ethernet, NVMe drives, and SAS and SATA drives (both SSD and hard disk).

In some cases, the production SDP systems may have updated versions of these products, but they may include others not yet in general use, such as storage-class memory like 3D XPoint DIMMs, Intel’s Xe discrete GPUs, cache coherent interconnects for accelerators (Intel’s Compute Express Link, CXL, or the Cache Coherence Interconnect for Accelerators, CCIX), silicon photonics, or some yet to be commercialized high performance technology.

Deegan says SDP will probably be implemented with commodity components, although they have not ruled out using custom hardware as well. “Dennard scaling died some time ago and Moore’s Law is pretty much dead,” he notes. “But that doesn’t mean it’s the end of innovation. We could see more variety of the hardware that is available to us. We’re still sitting on the fence in terms of those selections. Over time we will whittle them down.”

The advantage of the mid-2020 deployment date is that the SDP team will benefit from seeing the how the first exascale technologies fare in the early 2020s. And by the middle of the next decade, those technologies are likely to be a good deal less costly than they were in 2020.

The SDP system-level design review is slated to occur by the end of the year, which will be followed by additional prototyping work. In preparation of those efforts, Deegan says the team will be traveling to ISC 2019 in Frankfurt next month to talk with their partners and potential vendors. “We’ll be trying to get access to the latest and greatest technologies so we can learn a little more and get closer and closer to selecting the final technologies,” he says.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.