How – And When – Optical I/O Will Make Disaggregated Systems Better

As many of you know from reading The Next Platform, we are firm believers that eventually we will get disaggregated and composable systems that drive up the sharing of hardware resource across many workloads and therefore drive down the cost of hardware to support workloads. This would have always been a good thing, but as we come to the end of Moore’s Law and transistors are going to start getting more expensive and semiconductor packages are going to get more complex and more costly, too, something has to give.

And that something is the copper wire. And in the long run, one way or another, there will be silicon photonics – very likely co-packaged optics – that reduces the latencies between components in a system and therefore makes everything look as local as the speed of light will allow.

It is an exciting time to be a system architect, to be sure, and the stakes for HPC and AI systems have never been higher. And some of the best brains in hardware and software have been thinking about disaggregated and composable architectures and how silicon photonics can come together to make more efficient and hopefully less expensive distributed and hybrid systems. We have been fortunate enough to be able to gather some of them together to have in-depth discussions, which were presented in two webcasts hosted recently by The Next Platform, and we thought we would put them together in our place and share them with you.

The first presentation was a broader conversation, looking at the technical issues of disaggregation, particularly with regard to breaking the tyranny of the server motherboard and its relatively static configurations of CPU, main memory, and peripheral I/O. There is a growing consensus that the server rack may be the new motherboard socket. . . . Which probably sounds as weird as it sounds cool.

Webinar: Disaggregated System Architectures for Next Generation HPC and AI Workloads

This first panel discussion on disaggregated and composable architectures included:

  • Ian Karlin, principal HPC strategist at Lawrence Livermore National Laboratory
  • John Shalf, head for computer science at Lawrence Berkeley National Laboratory
  • Vladimir Stojanovic, chief architect at Ayar Labs
  • Josh Fryman, senior principal engineer at Intel
  • Doug Carmean, architect at Microsoft

There is also a growing consensus that we need to sort out the protocols that will be used to disaggregate the components of systems and link them back together over various transports. We have PCI-Express for general peripheral links and protocols like NVM-Express for linking to flash memory and CXL for linking to compute and memory, plus others such as CAPI, OpenCAPI, CCIX, NVLink, and Gen-Z. All of this needs to get sorted out and is getting sorted out, as it turns out. And not so much with the I/O Wars we have seen in days gone by in the computer business, but through innovation that has led to a kind of détente that encourages collaboration.

In the second panel discussion that we hosted, we drilled down into how CXL over PCI-Express was emerging as the standard for in-node connectivity for accelerators and storage but that the jury was still out for the transport and protocol that may span racks or rows of disaggregated and composable components.

This second panel shown above included:

  • Mike Ignatowski, senior fellow at AMD
  • Hugo Saleh, vice president of marketing and business development at Ayar Labs
  • Ian Karlin, principal HPC strategist at Lawrence Livermore National Laboratory
  • Craig Prunty, vice president of marketing at SiPearl
  • Mark Parsons, director of the EPCC at the University of Edinburgh

Among other topics, we talked about how CXL might be grafted onto silicon photonics transports and the prospects that silicon photonics might have to get rid of most of the electrical I/O signaling inside of server nodes. We do not think that the electrical signaling underpinning PCI-Express can scale forever, and more than we believe that normal DDR DRAM will be able to deliver the bandwidth applications require in the long run. We also talked a bit about the possible bridge between CXL and Gen Z, which can bring memory semantics and coherency through a different flavor of photonics to bear on system design.

We hope you enjoy both.

AWS
Vendor Voice - High Performance Computing on Amazon Web Services

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

1 Comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.