The (Second) Coming of Composable Systems

The concept of composable or disaggregated infrastructure is nothing new, but with approaching advances in technology on both the software and network sides (photonics in particular) an old idea might be infused with new life.

Several vendors have already taken a disaggregated architecture approach at the storage, network, and system level. Cisco Systems’ now defunct UCS M Series, for instance, is one example, and one can consider Hewlett Packard Enterprise’s The Machine as one contemporary example and its Project Synergy as two others; DriveScale, which we covered back in May, is possibly another. But thus far, none of these efforts have had widespread appeal—at least yet. However, according to Hubertus Franke, a distinguished research staff member focused on such architectures (with an HPC background), with silicon photonics on the horizon, which will eventually come down in price, and developments in seamless software tools to make swapping out components, the next few years could mark a shift in how rack-scale systems are designed.

As it stands now, a refresh cycle means ditching an entire box when even one of the components is no longer enough. Having the ability to swap out components based on shifts in application requirements makes good sense and leads to better utilization and cost effectiveness over longer time horizons, even if this seems, from the outset, to be a more upfront expensive way to look at flexibility of systems. “The refresh cycles on many components are divergent,” Franke tells The Next Platform. “Some need a refresh every two or three years with others much longer but ultimately, you’re throwing away the box when that’s not the only option.”

Traditional systems impose identical lifecycles for every hardware component inside the system. As a result, all of the components within a system, whether it is storage, server, or switches, are replaced or upgraded at the same time. The ‘synchronous’ nature of replacing the whole system at the same time prevents earlier adoption of newer technology at the component level, whether it is memory, SSDs, GPUs, or FPGAs.

“The biggest problem now is getting to the data—the latency, and bandwidth is also a problem, especially for the data-driven applications we will be seeing more of. It is a good time to ask if the time is right to rethink how systems are constructed and start disaggregating them. Taking out nodes, certain I/O components, or taking accelerators out and putting them in a different cage and making everything accessible via network access versus a bus. This is all possible and is becoming a smarter way to think about systems,” Franke says.

In rack scale architecture, each of the nodes in a rack specializes, or is abundant in one type of resource (compute, accelerator, memory, storage). These resources use the backplane to talk to each other as a single system, hence the emphasis on that backbone in such a setup.
In rack scale architecture, each of the nodes in a rack specializes, or is abundant in one type of resource (compute, accelerator, memory, storage). These resources use the backplane to talk to each other as a single system, hence the emphasis on that backbone in such a setup.

The goal is create a flexible, agile infrastructure in which the strong backbone in the rack becomes the computer, versus the single node. This has made sense for some time as most problems have long since outgrown the space of a node. In such a view, optimization should happen at the rack level and the emphasis should be on the low latency backbone (in these pre-photonics days, this is either PCI-Express or top of rack Ethernet switching) since other components can be swapped based on application needs. While both of those options have their advantages and drawbacks, for some recent research, Franke and colleagues built a PCIe based prototype (among others) to test several concepts related to the efficiency, scalability, and cost effectiveness potential of composable systems.

So why is the time right for composable systems to find their way into the mainstream now versus at other points when vendors made a great deal of noise? These approaches were somewhat ahead of their time, Franke says.

“If you really look at the basic parameters as we did, it’s critical to get down the performance of remote accesses to meet the latency requirements of an application. At the time of earlier efforts, the access latency of devices wasn’t there. For example, if you look at an early example like network attached storage, a very early example of a disaggregated approach because you were taking out the I/O component, it worked because the latency for access to a disk typically was around 5 to 10 milliseconds, so network latency didn’t matter much. Now, with SSDs and NVMs remotely attached, the latency goes sub-to-few microseconds and latency on the network is suddenly a big bottleneck,” Franke explains. The point is, we are entering a time when the network has caught up from a latency perspective with switches in the 850-800 nanosecond latency range with the ability to hide accesses to remote devices well.

Disaggregation architecture applied at the datacenter level (TOR=top of rack).
Disaggregation architecture applied at the datacenter level (TOR=top of rack).

When photonics arrive and application developers have begun to build more concurrency into their applications, there will be new opportunities—in part because of new possibilities for larger memory access and mounting demands from applications to take advantage of such potential. Even now, data analytics and graph applications have a tremendous appetite for memory—one that hasn’t fit into a single node for some time. The ability to very quickly and easily satisfy these demands outside of the single-box view could be a game changer, especially for cloud providers, Franke argues. “Instead of going to a service provider and getting a general VM, you can request the characteristics you want; it will be far easier for them to configure custom requests for applications. This ability will create more new applications designed because of this potential.”

Of course, this is not a cheap proposition. Even though the upgrade cycle is changed for the sake of flexibility, the addition of photonics and the new layers required to make this all seamless will take more of an investment than the current one-box approach. Still, Franke says, applications are changing already in a way that makes the cluster an inflexible relic—something that was not designed for applications that require memory, scalability, and compute in the ways we see with legacy codes.

This paper is worth a read in terms of performance of traditional versus composable architectures for data-intensive workloads, especially those that are frequently found in cloud environments (Memcached, Cassandra, NoSQL and more).

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.