At this point in the history of the IT business, it is a foregone conclusion that accelerated computing, perhaps in a more diverse manner than many of us anticipated, is the future both in the datacenter and at the edge.
The wonder is not so much that there are so many different kinds of chips being created to replace some of the functions that might otherwise be run on a general purpose CPU, but rather any of the big chip designers have not read the writing on the wall, given up, and just create a screaming fast serial processor that focuses just on that and relegates all other tasks – particularly vector and dot product math – to accelerators attached to it through high speed, low latency interconnects to these stripped down CPUs. Imagine how fast a Xeon or Power or Epyc or ThunderX processor might run if the focus shifted toward maximizing single thread performance on CPUs.
We think, in fact, that such a day will come, and that those companies that have focused relentlessly on accelerated compute will benefit greatly. In fact, it might turn out that only a company that makes accelerators – hint, hint Nvidia – and that has no vested interest in carrying on adding functions to already beefy CPUs to extend their lives can create such a minimalist CPU as we are pondering. An Arm architecture, such as the eight-core “Carmel” Armv8.2 chip used in the “Xavier” AGX platform for autonomous vehicles, might be a good place for Nvidia to start.
In the meantime, there is plenty of money for Nvidia to chase in the datacenter and at the edge with the GPU chippery that it has cooked up to augment the beefy CPUs that are the standard fare today. Nvidia intends to chase these opportunities, and the company talked about this at length at its GPU Technical Conference and its Investor Day events in San Jose this week. The focus of the conversations at both events was accelerated computing in the datacenter, and that is where we are going to focus now.
The total addressable market that Nvidia is chasing in the datacenter keeps growing, which is always a beautiful good thing, and the acquisition of Mellanox Technologies, which was announced last week, is going to nudge that market upwards even more.
The main reason why Nvidia’s datacenter business is growing so fast is that for more than a decade it has shifted from a maker of GPUs aimed mostly at client devices to a maker of platforms. The advent of CUDA and the acceleration that started on HPC systems has been expanded to AI workloads and now the nascent market of GPU accelerated databases, which have great potential for expansion. Nvidia is now creating platforms for streaming videos and online games as well as for running various data science workloads, including more traditional, statistical machine learning algorithms.
This was the hot topic at GTC 2019 this week, with Nvidia rolling out specifications for its data science server based on its Tesla T4 accelerators. These machines – why not call them the DSX-2 to have a consistent name? – can be used for machine learning training, since the T4’s “Turing” GT104 GPU has a variant of the Tensor Core math units used in the Volta GV100 GPUs. (We detailed the Turing GPU and the T4 accelerator last September.) Here is an updated chart that shows the relative performance advantage that accelerated machines have over plain vanilla systems based on Intel’s “Skylake” Xeon SP processors:
It is hard to read the fine print there, so here it is:
Note(s): CPU Baselined to 5000 Servers for each workload | Capex Costs: CPU node with 2x Skylake CPU’s ~$9K; GPU node with 4xV100 GPU’s ~$45K; DGX-1 -$120K; T4 Node with 4x T4 ~$20K | OpexCosts: Power & cooling is $180/kW/month | Power: CPU server + n/w = 0.6 KW; GPU server + n/w = 1.6 KW; DGX-1V/HGX-1 Server = 3.2KW | 4xT4 Server = 0.9KW HPC: GPU node with 4xV100 compared to 2xCPU Server | AI Training: DGX-1V compared to a 2xCPU server | AI Inference: T4 Server (4xT4) Compared to 2x CPU Server |numbers rounded to nearest $0.5M | Machine Learning: T4 Server (4xT4) Compared to2xCPU Server.
The baseline capital and operational expenses in this table did not change for the cluster of 5,000 Skylake server nodes since GTC 2018, but the extra memory in the 32 GB versions of the Tesla V100 accelerator plus price changes in systems were reflected in the HPC column; these units were already in the AI training column. Similarly, Nvidia shifted from the Tesla V100 to the Tesla T4 for the inference cluster, and the speed up is a little lower (50X versus 60X in the table last year), but the price for a Tesla T4 cluster was around $500,000 for 100 servers compared to $2 million for a Tesla V100 cluster with the same number of nodes. The machine learning column is new, and this is for that statistical machine learning as distinct from deep learning with neural networks that are used for AI training and inference. In this case, it takes one-tenth the number of data science servers to provide a 10X speedup, according to Nvidia.
Nvidia is not just selling GPU and interconnect components for datacenter compute. In some cases, as with the DGX-1 and DGX-2 servers, Nvidia is literally a manufacturer of systems and it sells them, and at the investor conference this week, Nvidia said that it had more than 1,300 customers for its DGX family of machines, up from 300 back in August 2017, the last time Nvidia gave out a customer count. The sales of either complete systems or the components that will go into systems created by OEMs and ODMs will all end up driving Nvidia’s datacenter business. That baseline GPU business for gamers and mainstream client devices provides the foundation for the more exotic stuff that is being peddled by Nvidia in the datacenter. One could argue that the HPC and AI architectures now trickle down to the gamers and other client users of Nvidia GPUs, as was certainly the case with the “Volta” and “Turing” generations of GPUs and similarly, “Pascal” GPUs got their start in the datacenter and eventually moved into GPU graphics cards. The datacenter drives the architecture, and the clients inherent a modified form of it and drive the chip volumes.
Jay Puri, the executive vice president in charge of worldwide sales at Nvidia, talked to Wall Street about the datacenter opportunity, and this year, the company talked about it a little bit differently than it has in the past. This time around, Nvidia is talking about “the new HPC market,” one that encompasses traditional simulation and modeling at academic, government, and enterprise customers in places like the energy sector, the media and e-commerce operations of hyperscalers and public cloud builders, and enterprises in general with a particular emphasis on the financial services, retail, telecommunications, and automotive sectors.
Based on estimates from Hyperion Research, IDC, and internal market researchers, Nvidia reckons that this new HPC server market had a $37 billion total addressable market in calendar 2018. To one extent or another, these companies are doing some form of data analytics and artificial intelligence that can be accelerated by GPUs, and some are doing simulation and modeling closely associated with science research in one form or another. Here is what the TAM for GPU compute looked like last year, and Nvidia’s forecast for five years from now, in 2023:
Let’s cut apart these pie charts and compare the size of the slices over time. The good news is that the pie is going to get bigger, but the bad news is that the HPC slice is going to get smaller by 2023, at least according to these forecasts. We will have just installed a fairly large number of exascale-class machinery in 2021 and 2022, so a downdraft is perhaps reasonable. Also, you have to remember that just because a TAM drops doesn’t mean Nvidia’s revenue drops within that TAM. It could be eating share, as it has been doing in the HPC and AI sectors for the past several years.
In any event, if you look at the $37 billion TAM that Nvidia is chasing for the datacenter in 2018, the company reckons that just under $13 billion in server infrastructure was sold into the scientific HPC market, comprising 35 percent of the pie. The hyperscalers accounted for another 25 percent of the TAM, with just a tad over $9 billion in sales, and mostly for machine learning training iron that drives their various translation, identification, and recommendation systems. The remaining 40 percent of the 2018 TAM pie came to just under $15 billion and was for stuff sold into enterprise accounts. (Remember, the energy sector is put into traditional scientific computing in this analysis, which is where it belongs given the nature of the workloads. Product design at manufacturers also belongs here.) Looking ahead to 2023, the TAM for the hyperscalers is expected to grow by around 35 percent to $20 billion, about 40 percent of the $50 billion TAM for servers, and enterprises are going to hold steady at around 40 percent of the market as well, which equates to about $20 billion for each slice. That leaves traditional HPC systems for scientific computing with a $10 billion TAM, which is 23 percent smaller.
The subtext in these charts is that at the hyperscalers, deep learning using trained neural networks is going to be augmented by those statistical machine learning techniques commonly deployed by data scientists, typically using Python and other similar languages. Enterprises will be adding machine learning to deep learning as well, plus expanding out to other kinds of data analytics – including but not limited to GPU databases – to grow that market. The traditional HPC sector will merge AI techniques with traditional simulation and modeling, but this TAM is still going to drop at least in 2023, according to Nvdia’s numbers.
These figures are interesting in that they change over time as the products that Nvidia and its partners bring to market are changing. We did our first look at the expanding TAM for GPU compute back in November 2015, which was pegged at a mere $5 billion at the time. We did another analysis of the market in August 2017, when the TAM for GPU compute stood at around $30 billion, and again last April at GTC 2018, when different aspects of this 2023 forecast were presented. Looking at the presentations for this year and last year gives the fullest picture.