Rapid GPU Evolution at Chinese Web Giant Tencent
March 24, 2017 Nicole Hemsoth
Like other major hyperscale web companies, China’s Tencent, which operates a massive network of ad, social, business, and media platforms, is increasingly reliant on two trends to keep pace.
The first is not surprising—efficient, scalable cloud computing to serve internal and user demand. The second is more recent and includes a wide breadth of deep learning applications, including the company’s own internally developed Mariana platform, which powers many user-facing services.
When the company introduced its deep learning platform back in 2014 (at a time when companies like Baidu, Google, and others were expanding their GPU counts for speech and image recognition applications) they noted their main challenges were in providing adequate compute power and parallelism for fast model training. “For example,” Mariana’s creators explain, “the acoustic model of automatic speech recognition for Chinese and English in Tencent WeChat adopts a deep neural network with more than 50 million parameters, more than 15,000 senones (tied triphone model represented by one output node in a DNN output layer) and tens of billions of samples, so it would take years to train this model by a single CPU server or off-the-shelf GPU.”
In 2014, when the company first rolled out details about Mariana, they said they were using an untold number of servers packed with 4-6 GPUs with CPU hosts. They were able to get a speedup of 4.6X by using 6 GPUs per server compared to one GPU and put in extensive work to push multi-GPU scaling to new heights—something that is still a pressing issue in research and at established deep learning shops like Baidu, Yahoo/Flickr, and others. The 2014 paper makes reference to the Nvidia Tesla series K20 GPU (versus the K40, which was also available then) but as Tencent spokesfolks tell The Next Platform, they were also among the first to use the machine learning-focused M4 GPUS as well.
This morning the company provided new details about expanding GPU use across the two elements that power many of their services. Tencent will be integrating the latest Pascal GPUs into its diverse workflows on both the deep learning and cloud fronts—using the former to power richer deep learning training and the latter to offer a broader set of capabilities to its cloud customers (more insight on China’s cloud market can be found here for a diversion).
China will now have a cloud outfitted with the most robust GPUs available, including the P100 and P40, which target the data movement bottlenecks via NVlink and come with CUDA-driven deep learning tools and libraries—something that Nvidia has invested in significantly over the last couple of years with reach across nearly all the main deep learning frameworks. Since GPUs are still king of the hill when it comes to deep learning training, the addition of Tencent to Nvidia’s public reference list of large-scale deep learning customers keep pushing the idea that GPUs and deep learning are set to go hand in hand for the foreseeable future, despite the tsunami of new chip architectures aimed at the deep learning processor market.
On the cloudy front, this is an interesting development because not even the world’s cloud giant, AWS, has moved to offer Pascal-generation GPUs via its cloud. The company made the announcement of the top of the line compute workhorse K80, a favorite among the supercomputing set, Tencent is blazing some serious trails by being out front of the Pascal cloud wave.
As part of the companies’ collaboration, Tencent Cloud intends to offer customers a wide range of cloud products based on NVIDIA’s AI computing platforms. This will include GPU cloud servers incorporating NVIDIA Tesla P100, P40 and M40 GPU accelerators and NVIDIA deep learning software. Tencent Cloud launched GPU servers based on NVIDIA Tesla M40 GPUs and NVIDIA deep learning software in December.
The M40 and M4 machine learning focused GPUs will also be made available to Tencent AI cloud customers, which means Chinese deep learning developers will be able to have a lower cost way to enter the larger world market to deliver new frameworks, software, and tools that can be trained and executed in faster, more efficient ways than with CPU alone or older generation Tesla hardware.
IBM announced last year that Tencent was integrating OpenPower servers for Tencent featuring GPUs and its own Power architecture, but the Tencent did not disclose the server maker for its cloud build out. One can imagine that Chinese hardware makers, including Inspur and others would be more likely choices for large-scale web builds, but the trick of scaling 8 GPUs per box with the proprietary NVLink interconnect is the domain of IBM’s OpenPower initiative.
Tencent will put its multi-GPU scaling and NVlink to the test for cloud users with servers holding 8 GPUs. As Tencent VP of cloud, Sam XIe says, “Tencent Cloud GPU offerings with Nvidia’s deep learning platform will help companies in China rapidly integrate AI capabilities into their products and services. Our customers will gain greater computing flexibility and power, giving them a powerful competitive edge.”
Although we tend to hear quite a bit about Google, Facebook, Baidu, and others when it comes to GPU-driven deep learning at scale, Tecent is certainly a rising powerhouse in China. The company, like Baidu with its Silicon Valley AI lab, has its own artificial intelligence laboratory, which focuses on “the extensive application of AI technology and basic research, combined with product data and user behavior learning for multiple products.” Here they are studying new algorithms for machine learning at scale with many new outreach programs to capture a new generation of Chinese AI developers.