Front Row with Frontera; Spectrum Scale Strategy; Computational Storage Deep Dive; Neural Programming Frontiers; MLPerf for HPC + More…
On today’s episode for the week, we talk to Dan Stanzione, director of the Texas Advanced Computing Center, to get an update on the “Frontera” supercomputer. Like the “Fugaku” system at RIKEN lab in Japan, Frontera is an all-CPU design. In this case, it has over 8,000 nodes based on Intel’s “Cascade Lake” Xeon SP processors, all lashed together by a 200 Gb/sec InfiniBand interconnect. We spoke to Stanzione about how this system has performed, TACC’s investment in Knights processors and Omni-Path interconnect in the past, its work on COVID-19 research, and the process of choosing the architecture for Frontera’s future and as yet undefined successor.
From its video delivery service roots to backing the storage needs of some of some of the largest supercomputers on the planet, we walk through the evolution and future roadmap for IBM Spectrum Scale with Wayne Sawdon, CTO for IBM Spectrum Scale and Elastic Storage Server at IBM. We look at where it begins to diverge to meet the needs of HPC, AI, and large enterprise simultaneously while absorbing newer technologies like large-scale flash. We also touch on why the hyperscalers are looking at file systems so much differently and what lessons might be there. And finally, would it ever make sense to open source GPFS, as we still can’t help but call it on occasion. “It’s a topic of interest,” he says, especially with the Red Hat model in full force. “But so far, IBM doesn’t know how to make money on a service.” Worth a watch…
We also talk to Justin Gottschlich, Principal Scientist & Director of Machine Programming Research, Intel Labs about the current and future directions for machine and neural programming. We take a look at the state of the art in automatic code optimization and project how neural programming could change the role of the developer. He argues that it will not put programmers out of a job, but enable them to be far more productive. A wide-ranging conversation about what can be automated and how—and where we still have work to do.
The BittWare division of Molex has cooked up a new hybrid memory subsystem accelerator card in conjunction with IBM that marries Everspin MRAM, Samsung DRAM, and Samsung zNAND flash memory all on the same card and interfaces with IBM systems based on its Power9 processors through the OpenCAPI 3.0 interface. That OpenCAPI interface peaks out at 25 GB/sec of bandwidth, and BittWare has shown in driving around 20 GB/sec of bandwidth on real-world applications with an average read latency of around 1 microsecond. This HMS-250 card uses an FPGA to create the OpenCAPI interface and comes with 1.5 TB and 3 TB capacities, has load/store memory semantics, and looks to be a core component in computational storage setups. Craig Petrie, vice president of marketing at BittWare, walked us through the feeds and speeds and prospects of this hybrid memory card.
Also on today’s program we talk about the status of the MLPerf HPC benchmark; what it measures and how it diverges from the mainstream MLPerf for AI applications with insight from NERSC machine learning engineer and MLPerf HPC contributor, Steve Farrell.
Tune in every Friday for a new program featuring in-depth interviews on special topics in hyperscale, HPC, AI, and large enterprise systems and software.