Like any emerging technology, artificial intelligence and various components like machine learning and deep learning are getting a lot of hype, with a continuous flow of analyst reports and news stories detailing how they all will change how business is done, research is conducted and operations are run. In all likelihood, in the coming years much of that will pan out. But right now, many research and academic institutions, enterprises, and startups are trying to figure out their way forward with the technologies, how much to initially take on and the best ways to get the compute power they need to move forward with their plans.
Supercomputer maker Cray, like many other systems OEMs, has been eyeing the AI space as a growth area for its business, and sees making compute power easier for businesses and researchers to get ahold of a key factor in becoming a significant player in the market.
“In artificial intelligence and deep learning, we are leveraging our supercomputing technologies to deliver solutions that allow unprecedented breakthroughs while also making deep learning more accessible,” Peter Ungaro, Cray’s chief executive officer, said during a conference call in April to talk about the company’s latest quarterly financial numbers. “While this market is still in the early stages, we have seen a few patterns begin to emerge with some of our initial customers with deep learning applications optimized for large datasets, including images and full motion video. From the system side, we are adding new options to our line of CS Storm GPU-accelerated clusters as well as improved fast-start AI configuration. These options make it easier for customers to get started on their AI journey with proof-of-concept projects and pilot to production use.”
The hardware is the starting point, of course, but there is more to it than that. In November 2017, Cray launched its Accel AI products and programs aimed at giving businesses the tools and support they need to learn about artificial intelligence, get their projects underway and scale their AI efforts as they grow. Those offerings included configurations of the vendor’s CS-Storm supercomputer that included software specifically for AI, machine learning and deep learning workloads. The systems have eight Nvidia Tesla V100 GPU accelerators and a deep learning software platform from Bright Computing that includes such AI tools as TensorFlow, Caffe2, Chainer, MXNet, and Microsoft Cognitive Toolkit.
The solutions also included Cray’s Accel AI Lab to help develop deep learning technologies and workflows, an enhanced Urika-XC analytics software suite with TensorFlow and support for the open source Jupyter Notebook document sharing web application, and a collaboration agreement with Intel around AI. Earlier this year, Cray added a four-GPU version of the CS-Storm 500 that also includes two Intel “Skylake” Xeon SP CPUs and is designed for such uses as neural network training and inference for various HPC applications. Adding a smaller system to the AI mix was a response to what Cray was seeing in the market, according to Paul Hahn, marketing manager for analytics and AI.
Cray’s AI hardware lineup also includes an XC system with the Urika-XC analytics and AI software suite, the Urika-GX analytics platform and the ClusterStor storage appliance with the Lustre high-performance parallel filesystem.
“Part of the challenge with AI — deep learning in particular — is matching the system to the models being used for a particular use case,” Hahn wrote in a blog post in April when announcing the CS-Storm 500. “On the surface, a system that features eight GPUs is a processing beast able to handle any use case thrown at it. In reality, developing a deep learning model is a bit of an art with the artist — the data scientist — mixing and matching data to models in an effort to achieve accurate results in a timely fashion. And sometimes, the balance between I/O and compute power isn’t quite right (i.e., too little I/O for so much compute). Adding a smaller configuration option isn’t groundbreaking. We know that. Some deep learning use cases require a sledge hammer (a cluster using high-density GPU nodes with eight or ten GPUs per node) while others require a club hammer (a cluster using low-density GPU nodes with four GPUs per node). The key here is matching tools with tasks.”
With the same idea of helping to accelerate the development and use of AI by giving businesses the tools they need, Cray is partnering with the UK Innovation Center and Digital Catapult to offer easier access to the vendor’s supercomputing power and programs. Digital Catapult, which is working to grow the UK economy through advanced technology, runs a program it calls the Machine Intelligence Garage. Through the program, Digital Catapult helps guide startups in the United Kingdom, in large part by giving them access to compute power and expertise. Every six to twelve weeks, the organization through the Machine Intelligence Garage selects 30 startups to support. Through the new partnership, startups will get access to not only Cray’s compute resources, but also its Accel AI Lab and performance analysis expertise.