On today’s edition we dig into the network and system architecture of the “Selene” supercomputer at Nvidia. We also talk about the challenges at scale for weather giant ECMWF. Also on the program we walk through the Flux framework in use at national labs for HPC scheduling and workflow management. We also discuss real versus “magic” compilers for AI chips. We conclude with our rapid insights segment on future data science.
We kick off today’s program with Mike Houston, chief architect of AI systems at Nvidia regarding the design of the company’s “Selene” AI-HPC supercomputer, which is the largest such machine installed at a commercial institution and the very heart and soul of Nvidia’s research in HPC and AI. We walk through the system and network architecture and discuss the design road to this machine.
Also on the program, we speak to Stephen Herbein, a scientist at Lawrence Livermore National Laboratory, about the Flux resource and job management software that the lab has created. Flux is a fully hierarchical scheduler, unlike many of the centralized controllers such as SLURM in HPC or Google’s Borg (which inspired Kubernetes), and in a very real sense is better than and can be used in conjunction with other schedulers.
Later in the program we talk to European weather giant, ECMWF, about its extreme scale data management and I/O challenges and how they are looking ahead to NVMe, low latency cloud platforms, and other ways to work around high data volumes and mission critical workload demands.
Also on the show we talk to Andrew Richards of Codeplay about “magic” versus real compilers for AI chips specifically. We talk through the current state of maturity of AI device software platforms and compilers and how steep the climb will be for adoption and at what performance/price point centers will be willing to make the leap given development overhead.
We end the program with a rapid insights segment with Mark Daniel Ward of Purdue University’s Data Mine. We talk about how the next generation of students coming into the market look at and use data science and what tooling they gravitate to and why.
Thanks as always for tuning in!