The Embiggening Bite That GPUs Take Out Of Datacenter Compute
May 16, 2017 Timothy Prickett Morgan
We are still chewing through all of the announcements and talk at the GPU Technology Conference that Nvidia hosted in its San Jose stomping grounds last week, and as such we are thinking about the much bigger role that graphics processors are playing in datacenter compute – a realm that has seen five decades of dominance by central processors of one form or another.
That is how CPUs got their name, after all. And perhaps this is a good time to remind everyone that systems used to be a collection of different kinds of compute, and that is why the central compute complex in an IBM System/360 was called the main frame. There were other frames, and in the case of RISC processors at IBM, the original 801 RISC chip that underlies half of its systems business today was commercialized as a controller for mainframe disk controllers. The Xeon processor that now dominates the datacenters of the world made a three-decade long jump off our desks.
Technologies leap around and trickle down, and it is fun to watch it happen. But it is easy to forget the long timescales and huge investments companies have to make to change the nature of compute. This takes decades, even in an IT sector that seems to be accelerating its rate of change.
Perhaps no one is having more fun when in compute right now than Nvidia, whose GPUs have become the compute engines of choice for accelerating a wide variety of scientific simulation and modeling workloads and that have utterly taken over the heavy lifting for machine learning training and a certain amount of machine learning inference workloads. The new “Volta” GV100 GPUs, which were designed explicitly to boost the performance on these high-end workloads, are like their predecessors in the Fermi, Kepler, Maxwell, and Pascal families in that they relegate the CPU to the role of air traffic controller while they do almost all of the calculating work – and do so quicker than can be done by the CPU itself. So it is not only more compute, but faster compute, and the combination is driving Nvidia’s Tesla and related GRID GPU accelerator businesses at triple-digit growth. The GPU compute business was growing nicely in its first five years when HPC applications were being ported to GPU accelerators, but once machine learning acceleration moved from research to production at the hyperscalers, this GPU compute business exploded.
Let’s put this into some perspective with some numbers, and we will start from the relatively small Tesla compute business from a few years back. The Tesla business draws its inspiration from the “Roadrunner” hybrid Opteron-Cell cluster built by IBM for Los Alamos National Laboratory, which was commissioned in 2006 and at the same time that Nvidia’s researchers were also experimenting with offloading parallel segments of Fortran and C programs from CPUs to GPUs. The GPU compute business learned to walk in 2008 with the formal launch of the Tesla line and the CUDA platform, and which started running in 2010 as the HPC community got on board with hybrid CPU-GPU machines akin to the petaflops busting Roadrunner. That now ancient machine, which was decommissioned in April 2013, had 13,896 X86 cores and 101,520 vector processors. IBM had a lock on the game console business back then, a very good Power processor, but it did not create a platform. IBM killed off Cell in 2012, and that was because by then it was pretty obvious that CUDA put the Roadrunner programming model to shame, and that was also the year when the hyperscalers discovered GPUs for massively expanding the training models for their machine learning algorithms. The Tesla GPU engines put CPUs to shame for massively parallel routines, much as IBM’s Cell chips did. If IBM was not so busy with its strategic initiatives, it would have long since bought Nvidia and Mellanox Technologies and built a real OpenPower alternative to Intel in the datacenter.
No matter. Nvidia is partnering to create that alternative to the Xeon platform anyway, and it is perfectly agnostic about the processor because the instruction set doesn’t matter as much as NVLink, PCI-Express, InfiniBand, and Ethernet interconnects for various aspects of the system do.
Nvidia does not often break out financial or shipment data for the Tesla compute and related GRID virtual visualization engines, but we catch snippets here and there.
Back in 2008, when the Tesla line was launched, the company shipped 100 million GPUs that were capable of running the Compute Unified Driver Architecture that had launched in 2006; it had over 150,000 downloads of CUDA by the spring of 2008 and it sold 6,000 units of its “Fermi” family of Tesla GPU accelerators. Fast forward to the spring of 2015 and a total of 576 million CUDA-capable GPUs had been sold and of these, some 450,000 were Tesla branded accelerators that were being used by HPC centers, hyperscalers, clouds, and selected enterprises; CUDA had over 3 million downloads by the GPU Technology conference in the spring of 2015, and the rate was running at something like 1 million in the prior 18 months. At this year’s event, Nvidia cofounder and CEO Jen-Hsun Huang did not brag about cumulative shipments of Tesla accelerators, but did say that people downloaded over 1 million instances of CUDA in 2016. Over 20,000 people attended one of several global GTC events in 2016, and Huang said Nvidia estimated that there were over 511,000 GPU developers in the world today, more than a factor of 11X compared to back in 2012.
We will venture a guess about what the Tesla installed base looks like in a second. Let’s talk about money first. Back in the first quarter of fiscal 2015, there was not really a GRID business to speak of and the Tesla compute business generated $57 million, or about 5 percent of revenues. By the fourth quarter of fiscal 2016 (which ends in January for Nvidia), the Tesla business grew modestly to $60 million in that quarter, and the GRID business was pushing $37 million. In the fourth quarter of fiscal 2017 ended in January of this year, we estimate that the Tesla business generated $222 million in sales (about 75 percent of its datacenter revenues, including sales of its DGX-1 systems), and that means Tesla alone had grown to be over 10 percent of the company’s revenues in that period.
In the first quarter of fiscal 2018, which ended in April just ahead of GTC 2017, the datacenter business continued to grow and shows no signs of slowing down. The datacenter business, in fact, grew by 186 percent – that is nearly triple – to $409 million in the period, and in a conference call with Wall Street analysts Colette Kress, the company’s chief financial officer, said that the core HPC sector of the Datacenter business – meaning simulation and modeling in its many forms – had doubled in the period. We assume that the GRID business is doubling, as it did in the prior sequential quarter, and that implies that the Tesla business actually grew by 3.2X year-on-year in the first quarter of fiscal 2018.
That, friends, is an explosion.
Nvidia’s overall revenues rose by 48 percent, to $1.94 billion, in the quarter, and net income rose by 159 percent to $507 million. If you take the Datacenter group out of the mix, revenues would only have grown by 31 percent and we think Nvidia would be a far less profitable company. Yes, gaming and online sports are booming, and so are the autonomous car revolution and the virtual reality matrix that we at The Next Platform are allergic to, and thus Nvidia would be a very attractive company for Wall Street even without the Tesla and GRID businesses. But we think HPC and now machine learning has pushed Nvidia’s engineers harder than competing with AMD for GPU sales ever did, and there are trickle down effects on the rest of the business. Besides, HPC centers and hyperscalers are helping to pay for all of that research and development that Nvidia might not otherwise do.
We have two questions and we ponder everything we heard at GTC 2017 last week. The first is: Can this datacenter business at Nvidia keep growing like this? And second: Just how many CPUs has Nvidia taken out of global datacenters so far, and how many will it prevent being sold in the coming years?
Growth is easier to reckon, and mainly because it does not go on forever because competitors enter the field and markets have their own natural sizes. Nvidia made HPC offload and then machine learning offload popular and profitable, and Intel launched its HPC alternative with Xeon Phi and AMD is getting back into the field with Naples Opterons married to Radeon Instinct GPU accelerators that can speak CUDA through its ROCm platform. Deep learning chip makers are coming out of the woodwork, and as Huang pointed out in his GTC 2017 keynote, we are witnessing the big bang of modern artificial intelligence and that over $5 billion was raised by AI startups in 2016, a factor of 9X increase since 2012.
Back in May 2015, Nvidia’s top brass talked about the total addressable markets that it was chasing beyond PC graphics, still its biggest product line and still growing like crazy. This bears repeating as we plot out possible growth curves for Nvidia. The company reckoned back then, two years ago, that there was a $5 billion opportunity for GPU sales within the overall $100 billion gaming industry; GPUs comprise about a third of a console’s cost, but the industry is mostly software, not hardware. There is another $5 billion in processing and GPUs for the auto industry, a subset of the $35 billion overall computing bill for the car makers. Enterprise visualization, which includes Quadro graphics cards and GRID remote visualization motors, had about $1.5 billion in potential sales to professional designers and another $5 billion for centralized datacenter services. Then the core compute business represented by Tesla (and of course other GPU cards) and represented by HPC centers and hyperscalers but quickly moving into the enterprise was a $5 billion opportunity. Add it all up, and Nvidia said it was chasing markets worth $21.5 billion. In the trailing twelve months, Nvidia booked $7.54 billion in sales, so it already has captured about a third of its addressable markets, which have probably expanded a bit in two years. Particularly as machine learning goes mainstream and as the new Volta GPUs do an excellent job on machine learning inference, not just training. So let’s be generous and say that the market is actually approaching $30 billion and Nvidia has captured about a quarter of its addressable market here in 2017.
The point is, there is room to grow, and Nvidia can double its share and double its revenues and still not have the majority of the markets even if it is dominant. This seems likely, and with all of the competition, it seems unlikely that Nvidia can capture more than 50 percent share. That would still make Nvidia a $15 billion business. And based on our rudimentary models, that is precisely what we think could happen as Nvidia closes out its fiscal 2019 year in January 2020.
Here is how we draw the lines projecting out, based on the assumption that no company can triple or quadruple sales forever and that customer adoption curves in new markets have a natural shape to them:
Nvidia is growing on every front except for OEM and intellectual property sales, and we think this will continue. And we also think that sometime in fiscal 2020, there is a good chance that the Datacenter group business will be on par with the company’s Gaming group sales – and astounding feat that would have taken more than a decade and so many billions of dollars of investment to make happen.
We are aware of the limits of extrapolation, but we don’t think the Datacenter or Gaming groups will, over the long haul, decline even if they do eventually slow down their growth.
Now, the second issue. How many CPUs did the GPU compute engines kill? This is a fun little thought experiment, and to do this, you have to ignore for the moment that certain kinds of computing are not thermally or economically possible with CPUs any more. You just can’t build a system with 100,000 CPUs at a price that makes sense unless you are making many tens of billions of dollars a year selling ads as Google, Facebook, Alibaba, Tencent, and Baidu do.
Just how many GPUs are we talking about that are doing datacenter compute? Back in May of 2015, the base stood at 450,000, and it took seven years to get that many GPUs into the field. In many markets that are fast growing – like X86 server sales into the datacenter over the past 25 years – the time it takes to sell a certain number keeps halving until it reaches a steady state. So let’s say it takes half as long to sell the next 500,000 GPUs, and call it three years to make the numbers work out. So there will be something like 1 million Teslas sold by this time next year, and it is hard to say how much churn there is, but maybe because of compute density, economics, and other issues only half of them are in the field.
So what is the gap in performance like? Huang thinks it looks like this:
With the Maxwell and Pascal generations of the Tesla accelerators, it was not uncommon to get a 5X to 10X speedup on HPC simulation or machine learning algorithms versus the two-socket Xeon server of the time. The GPUs need CPUs to run, so they don’t completely eliminate the need for CPUs. But it is not uncommon for 90 percent of the peak number crunching to be done by the GPUs. But let’s be conservative and say that every GPU sold takes out four Xeon CPUs because not every application scales well on GPUs. That is still something like four million CPUs that were prevented from entering the field, and it could be as high as twice that number. That is a big chunk of Intel’s business over the past ten years. And with the performance gap widening between CPUs and GPUs for certain kinds of work, it is no wonder that Intel has spent a fortune on buying FPGA maker Altera, invested heavily in its “Knights” Xeon Phi family of massively multicore processors, and acquired exotic neural chips from Nervana Systems. As time goes by, and the gap between CPU and GPU performance grows, the CPU takeout effect will be even more dramatic.