Breaking Memory Free Of Compute With Gen-Z
November 21, 2017 Jeffrey Burt
Servers have become increasingly powerful in recent years, with more processing cores being added and accelerators like GPUs and field-programmable gate arrays (FPGAs) being added, and the amount of data that can be processed is growing rapidly.
However, a key problem has been the enabling interconnect technologies to keep pace with server evolution. It is a challenge that last year spawned the Gen-Z Consortium, a group founded by a dozen top-tier tech vendors including Hewlett Packard Enterprise, IBM, Dell EMC, AMD, Arm, and Cray that wanted to create the next-generation interconnect that can leverage existing tech while paving the way for new ones at a time of rapid change within the computing industry where massive amounts of data are being created and processed.
The Next Platform has done a deep dive into the consortium and the Gen-Z technology, detailing the benefits the interconnect promises but noting that those anxious to get their hands on it will have to wait, for many a wait that could stretch well into 2019.
At the recent SC17 supercomputing conference in Denver, Michael Krause, vice president and Fellow Engineer at HPE and one of the creators of the Gen-Z Consortium, gave a high-level presentation about the technology, saying the group’s goal is to create universal protocol for accessing data that breaks the interlock between the CPU and memory and supports a broad array of media types without being tied into a single one.
The consortium, which now boasts 45 members, wanted to remove “the hardware complexities [and] the software complexities, and get back to what we used to do 60 years ago, where we just had compute and memory itself,” Krause said. “One of the things that really motivated Gen-Z was the fact that the compute and memory balance was getting out of whack. We’re getting to the point where even though we can add more compute to any system, we actually can’t feed it. So we want to be able to provide very high bandwidth and very low latency and restore that balance to our compute systems.”
At the same time, there was the recognition of the ongoing convergence of memory and storage, from rotating media to DRAM to SSDs. Any new interconnect architecture needs to be able to support all of them as well as new media on the horizon.
“One thing we’ve been faced with is DDR is coming up on 20 years old and PCI itself … just celebrated 25 years,” Krause said. “The paradigms really haven’t fundamentally changed. We are sort of locked in a paradigm and we need to break free from those architectural limitations. Gen-Z is our solution.”
Breaking the interlock between the processor and memory is important for a number of reasons, he said. Because the two were so intertwined, big developments in memory technology – for example, moving from one version of DDR to the next – was a significant moment and one that didn’t happen very often, so everyone in the industry had to time what they were doing to fall in line with the scheduled development. By breaking apart the processor and memory, memory technology can develop at its own pace, though there had to be consistent semantics so that people designing products now could ensure that the products would continue to work with interconnect technologies in the future. That called out for a simple, common protocol.
Krause outlined the key attributes a new interconnect technology needed, including the need for high performance. It has to deliver high performance and low latency, and be scalable. Gen-Z operates at speeds in the realm of terabytes per second and is a “lean and thin” protocol that can scale from point-to-point topologies up through switch topologies and into rack-scale, and it can do this without address to software complexity. In addition, the system needs to be reliable, with no stranded resources or single points of failure, unlike with DDR, where if a channel fails, it compromises the memory and the system as a whole.
Security is built into the interconnect, including hardware-enforced isolation and strong packet authentication. Starting next year, the consortium will add privacy to the architecture, Krause said. A key also is flexibility, meaning Gen-Z can support multiple topologies, component types (the interconnect supports x86, Power and Arm architectures, as well as GPUs and FPGA accelerators) and use cases, including single enclosures, rack-scale enclosures and eventually clusters.
“We can’t predict which storage-class media will win out,” he said. “We have to be able to support different types of media types and different types of latency that come with those media types. We also have to support different types of topologies because there is no one-size-fits-all for any given type of solution stack.”
Along the same track, flexibility through a universal protocol also means eliminating many of the hard choices customers need to make now that can drive up the overall costs of their solutions.
“Probably one of the most vexing things for vendors and customers alike is trying to figure out tradeoffs – how many DDR channels do you do? How many PCI slots to do you do? How many storage devices do you do?” Krause said. “We’ve got a plethora of different types of protocols, different types of interconnects, different types of form factors and we can’t figure out exactly what’s right for every type of customer. As a result, we incur a lot more expenses in many of our solutions that customers might like. This is why you’re seeing a focus increasingly on simplifying the ecosystems.”
Gen-Z also supports the growing trend toward composable infrastructures by teasing apart the memory and processor so they can work and be provisioned independently. The memory can be shared rather than be owned by the processor, and memory can be interleaved across multiple processors and aggregated to improve performance.
“More importantly, we can do that sort of memory-centric architecture or data-centric architecture, and we can put the acceleration as such so that it’s nearer to the data itself,” he said. “If we reduce the amount of data we move and the distance we move it, that’s going to make up more power-efficient and that’s going to improve performance.”
Then end result will be an interconnect that can vastly improve the performance of HPC and high-end applications, according to the Gen-Z Consortium. That includes running in-memory analytics and similarity search (on modified existing frameworks), 100 times faster large-scale graph inference (through new algorithms), and 8,000 times faster financial models by enabling organizations to rethink how they are done.