Businesses will need to adopt AI technologies not just because they can, but because they must – AI is the technology that will help businesses to be agile, innovate, and scale. So says the tech analyst firm IDC, which forecasts global spending on AI systems will double over the next four years, from $50.1 billion this year to more than $110 billion in 2024.
Drivers for AI adoption include “delivering a better customer experience and helping employees to get better at their jobs,” says IDC. “This is reflected in the leading use cases for AI, which include automated customer service agents, sales process recommendation and automation, automated threat intelligence and prevention, and IT automation.” Some of the fastest growing use cases include automated human resources and pharmaceutical research and discovery, the research firm adds.
However, the benefits of this technological revolution are spread very unevenly, according to Kunle Olukotun, co-founder and chief technologist of SambaNova Systems, the software defined AI hardware startup. “If you look at the people who are able to develop these sorts of systems, they’re in the hands of the few, the large companies that have the data, the computation and the talent to develop these sorts of algorithms, and of course they’ve used these systems to become the most valuable companies in the world – Google, Apple, Amazon, Facebook and the like,” he says.
The fundamental challenge lies with the sheer amount of compute power needed to build and train many of the more advanced models that are being developed. The models are getting larger and larger, and for some applications the volumes of data required to train them are also ballooning. This is exacerbated by the slowing of performance gains for successive generations of processor chips, a trend that some have labelled the end of Moore’s Law, according to SambaNova’s vice president of product, Marshall Choy.
“Multi-core has run its course, and single cores are inefficient, so obviously, just putting many of these together on a chip just increases the inefficiency,” he says. “So, we need a much more efficient architecture, as the platform for future AI and machine learning innovations to enable that whole new class of AI applications.”
Today, AI workloads are typically being processed by racks of systems using a combination of CPUs and GPUs. The latter have an architecture designed for much greater parallelism, with thousands of relatively simple cores optimized for floating-point throughput, which has proved much better suited than CPUs to tasks such as training machine learning.
The AI Goldilocks Zone
However, this success up until now masks the fact that GPUs were not originally designed for machine learning, and may not be suitable for every AI workload, according to Choy.
“If you look at where the GPU performs very well, it’s actually a narrow band of the overall research field in machine learning algorithms and applications. The GPU fits into this AI Goldilocks zone, in that it basically runs models really well that fit into the size of the GPU memory, and within the constraints of the architecture,” he claims.
Researchers are now pushing out of the ‘Goldilocks zone’ towards smaller, more highly detailed models using transformers and aimed at efficiency on the one hand, and on the other towards bigger models using bigger datasets and higher parameter counts, such as BERT and GPT in natural language processing, which can result in out of memory errors or might require thousands of GPUs to deliver.
These limitations matter because many organizations today are investing heavily in infrastructure that may be too inflexible to allow them to adapt to a rapidly changing economic and business environment.
Apart from the potential lack of widely available processing power to drive newer machine learning models, there are other trends that highlight the need for new approaches and new architectures in AI. The first, according to Choy, is that the processes of training and inference have traditionally been kept separate. Typically, the training of a model is performed using the brute force power of GPUs, while the inference, using the trained machine learning algorithm to make a prediction, is more often performed using an ASIC or CPU.
“What we see now in terms of real world applications is the need to converge both training and inference. Because you want to be able to do things like model fine tuning to specific use cases and enable continuous learning on small models, to enable things like transfer learning and incremental retraining on the inference node,” he explains.
Doing this with different systems, moving the results back and forth from one to the other, can be expensive and incur high latency in datacenters, so moving to an architecture that can do both tasks on the same system makes more sense.
Another requirement, according to Choy, is that any next generation compute architecture needs to be capable of expanding into areas of broader general purpose applicability, beyond just accelerating machine learning tasks.
“If you look at the data science pipeline, for example, you have other processes bookending the machine learning processes, you have things like SQL doing data prep and post processing, and data analytics. And so, it’s much more cost effective to have the same accelerator for machine learning and SQL in graph analytics for scientific computing applications, because they all have very strong similarities to machine learning applications in that they’re all dataflow oriented applications.”
This concept of dataflow is central to SambaNova’s notion of how next generation computer architectures will operate, and it is such a step change that the company believes it will usher in a new era of computing.
Dataflow computing is the end result of approaching applications the “software 2.0” way, a term coined by Andrej Karpathy that refers to the way machine learning algorithms are developed.
“Before machine learning, we had what we’re now calling software 1.0, and here code is written in C++ or some other high-level language, and it requires domain expertise to decompose the problem and design algorithms for the different components and then compose them back together,” says Olukotun.
“Contrast that with software 2.0, where the idea is that you train neural networks using training data, and the program is written in the weights of the neural network. This has a number of advantages, and the key one is that you have a reduced number of lines of code that have to be explicitly developed by the programmer,” he explains.
As an example, Olukotun cites the Google Translate service, which Google reduced from 500,000 lines of C code to just 500 lines of dataflow code in TensorFlow, a domain-specific framework for machine learning developed by Google but widely used elsewhere.
“What we see is that if we look at the development of machine learning applications, they are done using these high-level frameworks like TensorFlow and PyTorch. And these frameworks generate a dataflow graph of machine learning operators like convolution, matrix multiply, Batch Norm and the like,” he says.
These domain-specific machine learning operators can then be converted into ‘parallel patterns’, which express the parallelism and locality in the application, and can be optimized for higher performance.
“And what we see is that not only can these parallel patterns represent machine learning applications, they can also be used to represent the operators in SQL that are used for data processing. And these can be represented efficiently using parallel patterns,” Olukotun adds.
This then is SambaNova’s prescription for the new era of computing: support for hierarchical parallel pattern dataflow as the natural machine learning execution model; support for very large, terabyte-sized models that will provide much higher accuracy; support for flexible mapping of those machine learning graphs onto the underlying hardware; and the need to support data processing, specifically SQL operations, as these form a key part of machine learning training.
Put all these together and you have next-generation infrastructure for complex machine learning workloads that can equally well turn its hand to other data-intensive applications, such as those seen in HPC.
Sponsored by SambaNova