If you are trying to figure out what impact the new “Pascal” family of GPUs is going to have on the business at Nvidia, just take a gander at the recent financial results for the datacenter division of the company. If Nvidia had not spent the better part of a decade building its Tesla compute business, it would be a little smaller and quite a bit less profitable.
In the company’s first quarter of fiscal 2017, which ended on May 1, Nvidia posted sales of $1.31 billion, up 13 percent from the year ago period, and net income hit $196 million, up 46 percent over the same term. These are the kinds of growth numbers that all IT vendors like to show to Wall Street, especially with profit growth significantly outpacing revenue growth.
The datacenter portion of Nvidia, which it only started reporting on separately last year and for which it has given two years of financial results since it has become materially relevant, is growing much faster than the overall business. In the most recent quarter, this growth is without a doubt due to the uptake of “Pascal” Tesla P100 accelerators by the hyperscalers that have been able to get their hands on them ahead of the high performance computing market, which won’t see them until late this year in a few systems and early next year when they become generally available. With the recent launch of the GeForce GTX 1080 and 1070 cards also based on a variant of the Pascal GPU coming to market, gaming is about to get a Pascal boost, and professional visualization will presumably not be far behind with the Quadro line of GPU cards.
The Nvidia datacenter business had a bit of a lull in Nvidia’s second and third quarters of fiscal 2016, which correspond roughly to the point in the “Maxwell” GPU product cycle last year when Tesla customers had perhaps been expecting a Maxwell kicker to the Tesla K40 and K80 accelerators with double precision math and didn’t get them. The Tesla M40 and M4 accelerators, which launched last November, had plenty of single precision oomph but no double precision to speak of, and are also no doubt a part of the rising revenues for Nvidia’s datacenter products. Add these Tesla accelerator sales to revenues derived from GRID virtual visualization platforms, and Nvidia’s datacenter sales in its fiscal Q1 rose by 63 percent to $143 million, which we think was substantially impacted by Tesla P100 accelerator sales to the biggest hyperscalers in the world who had first dibs.
The GPU maker has clearly put a lot of wood behind the deep learning arrow, and thinks it can hold ground against alternatives such as Intel’s impending “Knights Landing” parallel X86 processor and hybrid Xeon GPU-Altera FGPA compute complexes. Nvidia co-founder and CEO Jen-Hsun Huang explained the situation as Nvidia sees it:
“In terms of how big that is going be, my sense is at almost no transaction with the Internet will be without deep learning or some machine learning inference in the future,” Huang explained on the conference call with Wall Street analysts. “I just can’t imagine that. There’s no recommendation of a movie, no recommendation of a purchase, no search, no image search, no text that won’t somehow have passed through some smart chatbot or some machine learning algorithm so that they could make the transaction more useful to you. People are now starting to understand this deep learning, that it really puts machine learning and artificial intelligence in the hands of engineers. It’s understandable, and that’s one of the reasons why it’s growing so fast. And so I don’t know exactly how big it’s going to be, but here’s my proposition. This is going to be the next big computing model. The way that people compute is that in the past, software programmers wrote programs, compiled it, and in the future, we’re going to have algorithms write the software for us. And so that’s a very different way of computing, and I think it’s a very big deal.”
We do, too.
Huang said in the call with Wall Street that he expected all of Nvidia’s product lines to grow in the next quarter and beyond that, but he did not elaborate on sales for the datacenter unit specifically. The hyperscale business is roughly the same size as the HPC business at this point, but both will be choppy since they are driven by very large deals among a relative handful of customers. As GPU-accelerated deep learning and HPC go more mainstream, the revenue cycle will likely smooth out but still be bumpy.
What seems clear is that deep learning at the hyperscalers has emerged as another piston in the datacenter business, which until a few years ago was anchored by the supercomputing sector and the massive parallel simulation and modeling workloads that users have ported from CPUs to hybrid CPU-GPU systems to see as much as a 10X speedup in their applications and even greater overall energy efficiency. With the GRID virtual GPU platforms and the high-end Quadro graphics tools, Nvidia has a sizeable enterprise business now. How big is hard to say, but what can be said is that Nvidia’s engineering is driven by the high performance needs of gamers and leveraged by these adjacent visualization, HPC, and deep learning workloads.
The thing that is good about the Nvidia model is that it can take technologies created for the HPC and hyperscalers that is can deliver today and derive higher profits from today and scale them down so they can be used in other end user products which have volumes even if their margins are lower or adjust them to be used in adjacent enterprise markets that also have reasonably high margins (but still lower than for the Teslas).
We think that more than a few HPC and hyperscale shops will be taking a hard look at the new Pascal-based GTX 1080 and 1070 graphics cards, which are much less expensive than a Tesla P100 card and that deliver a heck of a single precision wallop. The high-end GP100 Pascal GPU has a total of 60 SM compute blocks on it with 56 of them being active for 3,584 single-precision cores that deliver 10.6 teraflops of single precision performance. The GP100 is implemented on the Tesla P100 card with 16 GB of HBM2 memory with 720 GB/sec of memory, and sports NVLink high speed interconnects, and these additional technologies are on the of reasons why we think that companies are not just willing, but eager to pay a premium for the P100 accelerators. Our guess is they cost somewhere on the order of $10,500 at list price, and the fact that they support 16-bit half precision math effectively doubles their throughput on machine learning training algorithms to 21.2 teraflops. But for single-precision math, you are talking about $1,029 per teraflops. That price is high in part because the GP100 GPU has 5.3 teraflops of double-precision performance.
The new GeForce GTX 1080 graphics card, which will be available on May 27, is based on the Pascal GP104 GPU chip, which has 40 SM compute blocks with 2,560 cores in total running at 1.61 GHz and GPUBoost to 1.73 GHz, yielding 9 teraflops of single-precision performance for a suggest retail price of $599. The GP104 GPU has 7.2 billion transistors and fits into a 180 watt thermal envelope and is equipped with 8 GB of GDDR5X frame buffer memory, a spiffier version of the GDDR5 memory used in the prior generations of graphics cards. If you do the math, that works out to $65.56 per teraflops for single precision workloads, but as we have pointed out before, this card lacks many of the enterprise features that a Tesla card has, even if it can run CUDA hybrid workloads. The GeForce cards are certainly appropriate for prototyping on workstations or small clusters, and they also set the stage for a follow-on to the Tesla M4 and M40 accelerators based on the Pascal architecture and possibly using GDDR5X memory instead of the much more expensive and higher bandwidth HMB2 found on the Tesla P100 card.
The question we have is how much total addressable market can Nvidia take using a much wider Tesla product line and from these other alternatives as they also are tweaked to go after HPC and hyperscale workloads. With Intel clearly and confidently behind the FPGA, and committed to making FPGAs easier to program as Nvidia has committed to – and largely accomplished – in eight years, there will be plenty of competition. And whatever FPGAs can’t do, we can bet that Intel will show that regular Xeons or parallel Xeon Phi chips can do.
What is clear from Nvidia’s numbers is that its core GPU business continues to grow and throw off enough profits to continue to invest in more advanced technologies, and we think that Nvidia will be able to cash in big-time on the investments it has made in the Pascal generation in particular, accelerating its datacenter business at a higher level than we have seen in the past year. There is no doubt in our minds that there is pent up demand for Pascal products. We also expect a staged rollout for the next generation “Volta” GV100 GPUs, which will sport more performance, probably more HBM memory, and a next generation of NVLink interconnect when it is delivered in late 2017 for the “Summit” and “Sierra” supercomputers being built for the US Department of Energy. The top HPC and hyperscalers will get these GPUs first and it will take some time for them to cascade down to other users in other markets – perhaps even a bit more slowly than the Pascals are doing now.