Riding the AI Cycle Instead of Building It
April 10, 2018 James Cuff
We all remember learning to ride a bike. Those early wobbly moments with “experts” holding on to your seat while you furiously peddled and tugged away at the handlebars trying to find your own balance.
Training wheels were the obvious hardware choice for those unattended and slightly dangerous practice sessions. Training wheel hardware was often installed by your then “expert” in an attempt to avoid your almost inevitable trip to the ER. Eventually one day, often without planning you no longer needed the support, and you could make it all happen on your own.
Today, AI and ML needs this same level of support of experienced experts. AI in many implementations requires the selection of the right hardware and appropriate stabilizers or training wheels. Just like learning to ride a bike, getting started down the AI road requires a journey of cumulative experiences built up to achieve a level of balance and forward motion.
The current hardware, software and market reflects this issue of AI balance in a world currently comprised of all too similar, rather wobbly beginnings.
For some industry perspective, The Next Platform recently spoke with Per Nyberg, who in March 2018, was recently appointed as supercomputer maker Cray’s VP of Market Development for Artificial Intelligence and Cloud. Previously, Nyberg was their Senior Director of Artificial Intelligence and Analytics. Nyberg’s increasing responsibilities and extended job descriptions are a clear sign that reflects the focus on the importance of the AI momentum within the market, and particularly within the HPC focused company.
Cray is a traditional supercomputer provider with a long track record of designing and delivering large scale computing for their sophisticated customers. Oil and gas, weather, advanced modeling simulation, large defense and government plus other heavy hitters make up their bread and butter. They have a portfolio of AI offerings available, covering the bases of the “small”, “medium” and “large” requirements.
Our conversation with Nyberg initially focused around their newest product offering, the CS-Storm 500NX, a 4GPU machine. Not that a 4 GPU box, albeit packed with next generation Volta cards, is particularly innovative, new or unique in the current market, especially given that Cray already have existing 8 and 10 GPU configurations. They do however call a pair of these boxes that they then bundle with Bright Computing software their “AI Cluster Starter Kit”. AI in a box. However, what was considerably more interesting was trying to understand why a traditional HPC market leader with the provenance of Cray would even bother entering the already crowded commodity low count GPU box space, especially with such a crowded market for what is essentially the same hardware/
When The Next Platform asked Nyberg about Cray’s strategy for the emerging set of deep learning users, he recognized that there is space at both the high and low ends of the spectrum and that the decision factors are less about sheer hardware than getting it delivered with those all-important training wheels, Understanding often undocumented modern AI algorithms while navigating the complexity of mapping them to the appropriate hardware is non trivial, “Easy AI” doesn’t quite exist, but everyone wants it. Many are on a journey to ever increasing sophistication of their AI and analytics.
At first glance it’s hard to unpack the importance of the growing number of small specification, commodity x86 based GPU systems within the market from the top end supercomputer makers like Cray all the way down to the ultra-economical GPU dense systems from Supermicro and others, especially with the expensive but highly specialized DGX-2 boxes from Nvidia with the new NvSwitch we reported on recently. What is clear however, is the market is rapidly segmenting. We are seeing a rapidly forming split between the AI “haves” and the AI “have nots”. This is true in both the providers of solutions and those carrying out active installations and deployments.
However, one thing that is crystal clear is that any successful organization now needs AI and analytics to be fully baked into their business and processes. As an example, The Next Platform recently sat in on meeting about industry perspectives with Bill Ruh, GE Digital’s CEO and Chief Data Officer. During the free flowing discussion about AI in industry and how GE Digital were specifically annotating the landscape, Ruh explained to the gathering, “I’m not saying that every single GE product will contain AI specifically, but what I am saying is that every single GE product will have to contain some form of analytics”. Interesting. What Ruh was describing here is clearly focused on his customers demand and their need to be proactive around preventive maintenance, specifically for their turbines, jet engines and advanced “edge analytics” for their kit located way out in the field. The obvious potential for significant cost avoidance and circumventing unplanned downtime through AI is immediately and extremely compelling to customers and providers like Ruh alike.
Industry giants like GE are clearly fortunate to have vast resources and expertise be a long way into their own journey with AI. The underlying issue for them and others is that Industry 4.0, is really driving the focus on manufacturing and product delivery specifically way beyond our traditional computer and automation methods, into what are now being called “Cyber Physical Systems”. Essentially, you really do now have to think of AI and analytics as being the new serving of “fries with everything”.
The market is demanding all kinds of levels of AI inside their stacks. From exploration, to understanding how analytics will even fit in an existing business process, to proof of concept and then onto potentially full blown production ready analytics and integrated AI systems. It is clearly a journey, many are either starting out, mid way through, or already in production environments with their analytics strategies and right now there are many more players in the former stages of discovery than can be found in the later fully fledged and fully implemented stages.
But here’s the problem. The majority, (both providers and consumers) are still building “AI pipelines” from little bits and pieces of random stuff the exact same way we used to build COTS clusters in the early 2000’s. It took a long time for the market to evolve and stabilise to a point where you could select a SKU, with the words “HPC”, or “cluster” stuck on the side of it. Just like then, there remains no simple “add to basket” for AI systems.
So back to the Cray example and the question of the smaller (four GPU) boxes matters when the market is already rife with such offerings. Turns out it is a little more subtle than it first appears. These classes of systems are effectively becoming the new “bicycle training wheels” for AI and HPC scaling.
None of us start out, age 5 clad head to toe in Lycra, barreling downhill at 45 mph atop of a ridiculously expensive carbon fibre racing cycle. No. We start out small. We build our confidence with little bikes, and slowly move up the “cycling stack”. To stretch this cycling analogy a little more, many of us will never achieve the HPC, scale out computational equivalent of full Mamil status. Many of us are also content to never have to. On the other hand, many use cycles as a tool for the daily commute. The same is now becoming true of AI due to this challenge of the AI “fries with everything” being need for analytics and being so ubiquitous in order that we can understand more of our evolving and complicated environment and processes.
Here’s another challenge. Customers hate to be referred to as “junior”, or “immature”, or “learning”. While speaking with Nyberg, he consistently referred to AI “journey” rather than using emotive words such as “maturity”. We agree. Just because an organization isn’t far along in their AI and deep learning journey certainly doesn’t mean that they are in anyway “immature”. It does make for an interesting comparison though, especially when looking at these “junior sized” AI machines vs more traditional “super sized big iron supercomputers”. The market will tell us how many of these smaller machines end up being purchased, vs the more tightly integrated full end to end systems dedicated for purpose.
One final interesting aspect of Cray’s approach is their reuse of the software stacks from their “grandpappy” XC machines in their smaller product lines. Their cutely named Urika-GX can also be found on their starter hardware as well as their Urika-XC big box hardware. This should in theory provide for a more seamless upgrade path as folks move on to larger systems and need to scale up and out further.
Here at The Next Platform, we strongly suggest that those vendors who continue to provide continuous and seamless end to end support of their AI offerings for their customers from the low to mid to high end will eventually be the ones that win out. Those “all spare parts” providers with constantly changing and evolving software and hardware made from bits and pieces will clearly struggle in this dynamic and rapidly evolving market. Just like when you were small and learned to ride that bicycle, the core interfaces haven’t change much. Handlebars, pedals, seat and frame etc. have each remained pretty much constant.
Speed, performance, agility and subtlety of style come along much, much later in your journey…
Distinguished Technical Author, The Next Platform
James Cuff brings insight from the world of advanced computing following a twenty-year career in what he calls “practical supercomputing”. James initially supported the amazing teams who annotated multiple genomes at the Wellcome Trust Sanger Institute and the Broad Institute of Harvard and MIT.
Over the last decade, James built a research computing organization from scratch at Harvard. During his tenure, he designed and built a green datacenter, petascale parallel storage, low-latency networks and sophisticated, integrated computing platforms. However, more importantly he built and worked with phenomenal teams of people who supported our world’s most complex and advanced scientific research.
James was most recently the Assistant Dean and Distinguished Engineer for Research Computing at Harvard, and holds a degree in Chemistry from Manchester University and a doctorate in Molecular Biophysics with a focus on neural networks and protein structure prediction from Oxford University.