Rajeeb Hazra, VP of Intel’s Datacenter Group, is a car buff. Why is that important to HPC? Because autonomous cars are the future, and it will take a phenomenal amount of compute to support them.
Hazra recently shared that some estimates to accurately support 20,000 autonomous cars would require an exaflop of sustained compute. This level of supercomputing is needed, considering the network of millions of sensors inside and outside the cars and their interpretation, plus the deep learning needed to constantly stay aware of the world around them and the drivers inside them, and repeatedly pass new models to the cars. And that’s just 20,000 cars. Nothing hyperscales like traffic in the largest cities in the world, he pointed out. He also noted that, with autonomous cars coming in the near future, the current generation of kids growing up just could be the last that would need to learn to drive. That should give any parent of a teenager pause. Heady stuff for a supercomputer conference, where topics of black holes and gravity waves are centers of conversation. High performance computing has hit at the heart of the home.
But Mr. Hazra’s example was foundational to recent Intel’s announcements and their importance to how cognitive computing and HPC are going to work together. In his ISC keynote, “AI—and more—on IA”, he identified the already overwhelming difficulty of designing efficient, performant systems to solve existing challenges for workloads like simulation and modeling. And along has come cognitive computing, with its hugely complex problems. Can the industry make it work in a commercial context? He posited the question, ‘is this notion of machine learning, and intelligence driven by computation or cognitive computing going to break the camel’s back…in terms of managing the complexity of technology R&D and associated business model that keeps it manageable and commercially profitable?”
He also explained Intel’s approach and technologies that are enabling the convergence of multiple workloads—including cognitive computing—on a common platform, the Intel Scalable System Framework (IntelSSF). He introduced the role that the new Intel Xeon Phi processor—Intel’s bootable host processor based on its many-core architecture—will play in machine learning, machine learning training, and cognitive computing. And he offered why IntelOrchestrator is core to simplifying deployments and empowering more users to engage HPC.
For several years, the industry has been promoting the need for separate architectures and infrastructures for each of simulation and modeling, big data analytics, machine learning, and visualization workloads. Today, the industry is now largely in agreement that these are different types of workloads that can—and should—run on the same HPC platform. Intel Scalable System Framework is such a “design surface” (not a design point) that can support all these workloads. Mr. Hazra explained that you can be at any point on that surface, depending on how you want to provision for small to large scale clusters, data-intensive or compute-intensive workloads, deploying on dedicated infrastructure or in the cloud. The key, he adds, is that, as a common design surface, it provides “standards-based programmability” in ways that extend the developers’ abilities to reap performance through parallelism or other techniques, but it does not “break their back” by necessitating a complete programming model change generation to generation for software to keep up with hardware.
For their HPC framework, the company introduced the first of a future library of architecture specifications based on Intel SSF, and Reference Designs built on the specifications. Each specification will focus on a particular architecture and each Reference Design will provide a recipe for deploying with particular ingredients to meet the needs of a set of software applications and tools. These architecture specifications and reference designs will enable Intel SSF ecosystem partners and customers to more easily and quickly construct known, tested configurations for particular types of applications. The first of the specification and Reference Designs can be accessed here.
Five live booth demos at ISC showcased the capabilities of Intel SSF through interactive visualization demonstrations running on an Intel SSF cluster using the new Intel Xeon Phi processors 7210. The demos included:
• École Polytechnique Fédérale de Lausanne (EPFL) highlighted how multi-core based visualization is now a viable, performant, and preferred path compared to GPU based visualization.
• Kyoto University demonstrated how dual-socket IntelXeon E5-2699v3 (Haswell architecture) chipset delivers better performance than an NVIDIA K40 GPU using 16-bit arithmetic when training deep learning neural networks for computational drug discovery using the Theano framework.
• Researchers from the Stephen Hawking Center for Theoretical Cosmology COSMOS provided a simulation that visualized two black holes colliding and generating gravitation waves in super real time.
Central to enabling the benefits of Intel SSF for large-scale parallelism is the Intel Xeon Phi processor, the new many-core architecture, bootable host processor, offering three “firsts for Intel, and in many cases all of commercially viable and successful architectures,” said Mr. Hazra in his keynote. First, Intel Xeon Phi is a processor designed for highly parallel workloads; it is not an accelerator. Thus, it takes away the challenge for developers to adopt an off-load programming model, requiring code partitioning and movement to another device plugged into the system. Intel Xeon Phi processor supports the same programming models developers are used to with x86, so it introduces greater performance and scalability without disrupting the methods that accelerators have required for years now. Second, its on-package high bandwidth memory breaks the memory wall and provides 3X over DDR architecture, and it allows “us to re-engineer the way caches are used both in the system as well as by the programmer”. Third, IntelOmni-Path Architecture (IntelOPA) fabric is integrated in the processor, gaining the performance and efficiencies of integration and Intel OPA technology. In about the size of a thick credit card, Intel Xeon Phi processor offers “tremendous performance and density advantages.”
Mr. Hazra goes on to show how Intel Xeon Phi processor has proven its performance advantages to “accelerate faster than an accelerator without requiring such a tremendous disruption to the programming model” as measured against current, competitive GPGPUs.
• Running the LAMMPS life science application against an NVIDIA* K80*, Intel Xeon Phi processor achieved 5X performance over the GPGPU.
• In finance Monte Carlo double-precision simulations, it performed 2.7X faster than the same competitor.
• In visualization (graphics with compute), it outperformed an NVIDIA Titan X* by 5.2X.
But that’s just performance. Mr. Hazra emphasized that Intel Xeon Phi processor has the opportunity to change how machine learning and deep learning training is done. Today, this is done on a single node, but he points out that “we now have the ability of using HPC computing techniques of using the well established paradigms of MPI and distributed programming models to actually provide a tremendous boost to the ability to train.” That means shorter times to train, or higher quality of those models, “making deep learning, deeper, and therefore serving many more inference capabilities that don’t exist today.” The vision is to bring the scalability, and thus the benefits, that the industry learned in traditional HPC—from workstation to supercomputers—to deep learning training.
Mr. Hazra illustrated this opportunity with more data derived from testing with the Intel Xeon Phi processor. He goes on to show how scalable the solution can be with 128 Intel Xeon Phi processors providing 50X faster training than a single Intel Xeon Phi processor. “We are now in a position to actually drive deep learning training into HPC. That is a huge game changer for the quality of models, for the accuracy of training, as well as a gateway to exploding amounts of scoring and inference applications that enrich our lives.”
Augmenting the announcement of Intel Xeon Phi processor, former director and chief evangelist for IntelSoftware, James Reinders, plus Intel colleagues Jim Jeffers and Avinash Sodani, released their new book at ISC called Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. The book was written to help developers get off to an early start with parallel programming on the new processor.
Support for Intel Xeon Phi processor in the ecosystem and engagement by hardware vendors and software vendors is growing rapidly.
Mr. Hazra then introduced Intel Orchestrator, an Intel-supported software stack based on the OpenHPC community stack (www.openhpc.community). System software is a critical part of the entire solution. OpenHPC is an effort by the industry—vendors supporting both IntelArchitecture and other architectures—to contribute to a stack that supplies the basic components and tools for HPC deployment, while enabling differentiation and innovation on top of the stack by individual vendors. “The reason this is important,” Mr. Hazra emphasized, “is that we as technologists, vendors that are constantly driving the hardware technology, make it incumbent upon us not to make the features of that technology as much a burden for its users.”
Intel Orchestrator is being offered in three different products, from turn-key to highly configurable, “because at different levels of scale, different kinds of software infrastructures are necessary with different levels of capabilities and different support complexities.” Intel Orchestrator’s first product is planned to be available in Q4 2016 through Intel channel partners.
Intel continues to invest in its roadmap for the future of HPC. Mr. Hazra announced two of the latest products and presented convincing evidence of why they are critical to moving the next level of HPC and cognitive computing forward.