The public cloud is precisely as conservative and innovative as the enterprise customers that make use of it. Clouds are a great way for many organizations to share a new hardware technology and see how it pans out running real applications, whether designed by third parties or created in-house. While GPU accelerators have found their place in the upper reaches of the HPC and hyperscale communities – doing modeling and simulation with the former and training deep learning algorithms at the latter – it is fair to say that GPU availability and adoption on public clouds is still in a nascent stage for a number of reasons.
That said, there are a number of public cloud vendors who offer GPU acceleration for a portion of their compute infrastructure, and the availability of this hybrid CPU-GPU capability is an important resource for customers who want to move from dabbling to production for certain kinds of workloads. And, we think that the adoption of GPUs on clouds could rise as software makers figure out their cloud pricing strategies (as must be done in the traditional supercomputing market) and as companies adopt an open source stack that doesn’t require licensing at all (as can be done with many deep learning stacks).
It is just a matter of time, and timing, and oddly enough, the ability to run virtual desktop infrastructure (VDI) from public clouds with virtual CPU and GPU slices could get enough capacity out there on the cloud to get customers chasing it. Moreover, IBM‘s SoftLayer cloud continues to forge ahead with GPU-accelerated bare metal servers, which it first started delivering back in 2012, and has recently announced it is offering customers to ability to fire up Nvidia‘s latest dual-GPU Tesla K80 coprocessors on its bare metal instances.
It has taken the better part of a decade for even a respectable portion of the aggregate compute in the world to be put out onto the public cloud, so we don’t expect for the GPU portion of compute to be particularly large right now. Nvidia first commercialized GPU computing on its video cards back in 2008, and over the ensuing seven years has rolled out a succession of specially tuned GPU compute engines with features designed specifically for accelerating simulation and modeling workloads, both in the traditional HPC and financial services industries. That HPC and Cloud business, had $279 million in revenues for Nvidia’s fiscal 2015 year ended in January, as we reported in our detailed financial analysis of Nvidia back in May, and the business was growing at 53 percent year-on-year. Through fiscal 2015, Nvidia has shipped 576 million CUDA-capable GPUs and 450,000 Tesla GPU accelerator cards for servers.
You might presume that for workloads that only require single precision floating point math – or even FP16 mixed precision or 16-bit math as many deep learning algorithms can get by with – sales of GeForce cards like the Titan X cards aimed at deep learning are in these HPC and Cloud revenue figures. But they are not. Those HPC and Cloud revenues are for Tesla-only co-processors.
“CEOs and more importantly IT directors want datacenter, enterprise-class solutions,” Ian Buck, vice president of accelerated computing at Nvidia, tells The Next Platform. “While some researchers might deploy Titan X at small scale, once you scale up with a sizable investment, you need features like ECC, system management, maintenance and support, an accelerator that has been fully qualified for 24×7 datacenter usage, and a system provider that will stand by the solution. The Tesla K40 is the most popular GPU for deep learning in the datacenter, as it offers the largest single precision individual GPU performance which gives the quickest training time for a network.”
Our point is that GPU acceleration is a growing and profitable business for Nvidia, and in fact, it is the fastest growing and most profitable segment of Nvidia. And there is no question that Nvidia wants to accelerate this business, particularly by seeing more widespread adoption by hyperscalers and cloud builders. To be sure, Nvidia would no doubt prefer to sell boatloads of Teslas and Titans to individual customers, who would build up their own beefy servers and hefty clusters. The trouble is, an enterprise might have trouble keeping such machines busy enough to justify the investment. This is why parallel processing and data analytics is a growing workload on generic CPUs on public clouds for certain classes of work. We think customers will look to the public cloud as a platform to run intermittent scale-out work that is goosed by GPUs and FPGAs, too, by the way.
Finding GPUs In The Clouds
At the moment, Nvidia identifies six cloud providers that provide cloud-based GPU capacity or hosted GPU capacity. (The former is available on demand at hourly rates, while the latter is for longer-term hosting engagements.) Amazon Web Services was the first to offer GPUs on demand among the big public clouds, back in November 2010, when it put Tesla M2050, using the “Fermi” GPUs from Nvidia, on its CG1 compute instances, which sported “Nehalem” Xeon X5570 processors from Intel and 10 Gb/sec networking to link nodes together. Those Tesla M2050s provided 515 gigaflops of double precision floating point oomph across their 448 CUDA cores, and each node had two of them; AWS allowed customers to glue up to eight nodes into a baby cluster with 8.24 teraflops aggregate and if they needed more than that, they had to call.
Three years later, AWS launched its G2 instance types, which use Nvidia’s GRID K520 GPUs, which are useful for both compute and visualization work. These server nodes, as it turned out, had four K520 cards and used Intel’s “Sandy Bridge” Xeon E5 processors. In April of this year, AWS expanded the G2 instance so the whole server and all four K520s could be deployed as a single instance. These K520s are really aimed at single precision workloads and does not support error correction on the GDDR5 memory on each card. So it is suitable for seismic analysis, genomics, signal processing, video encoding, and deep learning algorithms, but not the heavy duty HPC simulations that model physical, chemical, and cosmological processes and generally use double precision math. In any event, the latest G2 instances have up to four GPUs across two cards in the server, each with 1,536 CUDA cores and 4 GB of frame buffer memory to run applications; Nvidia does not provide floating point ratings on the GRID devices, but the cores run at 800 MHz, a little faster than on the Tesla K10 that it most resembles, and that means the four GPUs should weigh in at around 9.8 teraflops at single precision.
The other vendors that Nvidia cites as having GPUs either on demand or hosted include Nimbix, Peer1 Hosting, Penguin Computing, Rapid Switch, and SoftLayer. Amazon offers the G2 instances in North America, Europe, and Asia and so does SoftLayer. The other players are in North America or Europe and Peer 1 does both, at least as far as the survey done by Nvidia says. We also happen to known that Rackspace Hosting offers GPUs to gaming and other customers who use its hosted servers (rather than on-demand clouds), and they are not on Nvidia’s list. Neither Google Cloud Platform nor Microsoft Azure have GPU accelerators available to cloud customers, but both companies certainly use GPUs to accelerate their own workloads. (Google is rumored to have over 8,000 GPUs in its stack, and it probably has well north of 1 million servers overall.)
AWS has lots of HPC customers – the exact number is not known – but has not talked specifically about how many and to what extent that they are using its virtual clusters to run simulations, models, and other applications. The company did say when it announced the G2 instances a year and a half ago that the CG1 instances were still popular.
Marc Jones, the CTO at SoftLayer, spoke to The Next Platform as that cloud was rolling out support for Nvidia’s latest Tesla K80 accelerators onto its bare metal cloud to give us a better sense of what is happening with GPUs on the cloud for HPC-style workloads. SoftLayer has been providing GPU-accelerated systems since April 2012, when it launched Xeon server nodes with one or two Tesla M2090 cards from the Fermi generation as an on-demand service. The company has rolled out different generations of Tesla co-processors since that time, and currently offers Tesla K10 cards (aimed at single precision) and Grid K2 cards. Neither of these cards support advanced features in the Tesla line, such as dynamic parallelism and Hyper-Q, which significantly boost the performance of certain workloads, and neither has as high double precision floating point performance as the Tesla K20, K40, and K80 units.
The Tesla K80 card that SoftLayer is now supporting with its bare metal service has two of the Kepler GK210 GPUs on it, each with 12 GB of GDDR5 memory and each with 2,496 cores. Across both GPUs on the K80 card, Nvidia delivers 1.87 teraflops double precision and 5.6 teraflops single precision, but with overclocking it can go as high as 2.91 teraflops double precision and 8.74 teraflops single precision.
The bare metal servers supporting Tesla K80s will initially be available in SoftLayer’s Dallas, Texas datacenter and will eventually be available throughout the 27 datacenters that the IBM cloud unit will be operating. Jones can’t be specific about how many servers SoftLayer has as part of its infrastructure any more, but just as IBM was acquiring it the total number was 120,000 servers (all made by Supermicro) and the expectation was to have doubled that by the end of 2014. This did not happen, but Jones did say that by the end of this year SoftLayer will be running 46 datacenters (including those being moved over from IBM’s hosting operations) and will nearly double its datacenter floor space. Our back of the envelope math says that SoftLayer would have close to 200,000 servers by the end of the year and lots more space to expand. We also suspect that only a small percentage of them will have GPU accelerators.
While most cloud providers talk about having infrastructure on demand, for add-ons like GPU or even for anything exotic such as needing to fire up more than a few hundred nodes at the same time, you actually have to call first. SoftLayer has some nodes with Tesla K80s installed, and will roll them out worldwide, but it is really wanting to engage with customers to figure out the demand and build this to order.
To date, SoftLayer has had hundreds of customers running applications on its GPU-accelerated systems, says Jones, and it already has dozens using the Tesla K80s even as they are just being announced this week. One is the machine learning program at New York University and another is MapD, which is in the process of launching a GPU-accelerated database.
For most cloudy GPU customers, a typical virtual cluster is somewhere between three and ten nodes, with one of two GPU cards per machine. This size of cluster has been pretty consistent over the three years that SoftLayer has been peddling virtual ceepie-geepie iron. Given SoftLayer’s bare metal cloud and it popularity among the oil and gas industry, it is no surprise that these were the early adopters of running GPU workloads on demand. The gaming industry and video rendering and transcoding applications are also steady users of the hybrid systems on the SoftLayer cloud. In the Dallas datacenter, the GPU nodes can be linked with InfiniBand networking and have shiny new “Haswell” Xeon E5 processors from Intel, which have a certain amount of their own integer and floating point kick. The nodes can run Windows or Linux (just like those offered by AWS) and they come with up to 512 GB of memory and flash SSDs in 800 GB, 960 GB, and 1.2 TB capacities.
“You can really trick out one of these servers,” says Jones with a certain amount of pride.
The servers that the Tesla K80 GPUs can plug into can be equipped with a pair of six-core Xeon E5-2620, ten-core E5-2650, or twelve-core E5-2690 processors. Each Tesla K80 card costs $500 per month to rent, and depending on the model, a configured system ranges in price from $1,359 to $1,529 per month with a single K80 card. Loaded up with 256 GB of memory, the top-end processors, and two K80s, you are looking at $2,409 per month. The Tesla K80 card costs $5,000 list price, so SoftLayer can get its bait back at list price in ten months on the card alone, and we are pretty sure it can get a volume discount from Nvidia, so the return on investment is better than that. And, as AWS clearly demonstrates, servers and GPUs can stay in the field for many years and still make money for a public cloud provider.
In a lot of cases, this still comes down to experimentation. “In a lot of cases, customers want to try GPUs for HPC, but they don’t want to make the initial capital investment,” says Jones, echoing the main reason the public cloud is in general popular. “Sometimes they will try it virtualized with GPUs, and then they try it with bare metal, which gives them better and more consistent performance. In many cases, such as in the oil and gas industry, it may not be a 24x7x365 workload, but they may have a job running for six or eight months and they want to max that out. That said, we do have customers with a certain degree of steady state who keep their GPU workloads going. These are usually data analytics workloads, not simulations.”
There is not, by the way, a lot of bursting of workloads from HPC centers to the SoftLayer cloud, something that people talked about doing for a lot of years in the HPC community.
One of the tough things for any public cloud provider to try to figure out as they look to add accelerators to their compute farms is precisely what co-processors to add. Nvidia has several flavors of GPU cards, and then there are also FPGAs from Altera and Xilinx to consider, too. They have to make their bets and do the best they can. AWS is leveraging the infrastructure it needs for a VDI service as GPU accelerated compute, but some of the top features of the Tesla family are not in these cards. SoftLayer is going full-bore with the Tesla K80s, but it should probably also consider using the Nvidia GeForce Titan X cards, which are based on the “Maxwell” GPU, which (at $1,000) cost one-fifth as much as a Tesla K80, and which offers 7 teraflops of single precision floating point performance. For certain workloads, this is the right card for the job, even though it does not have a Tesla brand on it. But that means splitting the base of GPUs in two.
Decisions, decisions. Some things are just tougher for clouds than for enterprises, which control their own workloads.