For a long time, datacenter compute has been the very picture of stability – Intel-based servers running enterprise workloads in central facilities. The workloads are changing fast and the datacenter is dissolving, and this is all having a ripple effect throughout the infrastructure, from the servers and storage appliances down to the components, most notability the silicon that is powering the systems.
Once the domain primarily of Intel and its X86-based chips, new architectures are coming to the fore, from GPUs from Nvidia and AMD over the past decade to now Arm beginning to make its long-talked-about inroads into the datacenter with its low-power architecture a reality. As we at The Next Platform have talked about over the past couple of weeks, Nvidia is now arming itself with data processing units (DPUs) that are aimed at the AI and analytics workloads.
Make no mistake, though. Intel is still at the top of the ladder and it’s not standing still. The company in 2015 saw the need to be able to address emerging workloads and spent $16.7 billion to buy Altera and its field-programmable gate array (FPGA) technology and, more recently, is expanding its GPU capabilities. Over the past several months, Intel has begun to talk more about its Xe discrete and integrated GPU strategy.
But other chip makers are on the move as well in a market that continues to consolidate. Nvidia is making a $40 billion bid to buy Arm from giant Japanese conglomerate SoftBank Group, bringing its CPU architecture into the fold (Arm’s designs already are used with Nvidia’s GPUs) to give it control of more of the processing functions in datacenters – whether on premises or in the cloud – and in everything from Internet of Things (IoT) sensors to supercomputers. The deal also will bolster its efforts around AI and machine learning, which are foundational to its growth strategy and make it a stronger competitor to Intel.
Now, the same week that AMD announced the third generations of the Zen microarchitecture that has helped fuel the company’s resurgence in the datacenter with its Epyc server chips, the company – as reported by The Wall Street Journal – is in advanced talks to acquire FPGA maker Xilinx, adding to a portfolio that already includes x86 CPUs and Radeon GPUs.
Against this backdrop, representatives from Arm, Nvidia, Oracle Cloud, VMware and web infrastructure and security company Cloudflare got together over Zoom this week during Arm’s virtual DevSummit 2020 to talk about the future of the datacenter in light of the changes, the focus on workloads and data, reach into the cloud and out to the edge and the continued introduction of silicon beyond the CPU. They looked at the situation from different perspectives and the demands on their technologies, but what they essentially agreed on was that the challenges call for a broad array of technologies whose use will be dictated by the workloads, that enterprises are still looking for performance and efficiency, and that the datacenter, cloud and edge environments need common tools that can enable organizations to address these domains in a holistic way, particularly as more workloads are finding their way into the cloud.
“We’re really interested in, how do we enable more of the sort of public cloud capabilities in the datacenter and a lot of that dynamism, that agility,” said Kit Colbert, vice president CTO of VMware’s Cloud Platform business unit. “Essentially, how do we enable like an API in the hardware? It’s a really, really interesting question to allow more composability in terms of clusters and built to deliver the right hardware resources where they’re needed. One of the things that we see is that the central-service CPU is really being desegregated and that there’s a lot of accelerators coming into place – GPUs, FPGAs, smartNICs, etc. – to help drive performance of modern workloads. As all those accelerators come in, it becomes much more difficult to manage them, to ensure that the right apps are on the right servers. So how do you get more of that agility? How do you get more of that composability? That’s a really, really key problem that we’ve got to think through.”
Oracle, which is all in on the cloud, is taking an approach that looks at the cloud from the position as a traditional enterprise software maker.
“Our customer base is traditional enterprise,” said Karan Batta, vice president of product for Oracle Cloud. “Most of their critical systems or products are still running on-prem. They want the best of both breeds. They want all the specialization and the performance characteristics that they get from on-prem, but they also want none of the downsides. They want all the benefits of cloud at the same time.”
For several in the group, much of the focus is around how the datacenter can be rearchitected so support modern workloads in a way that ensures performance, power efficiency, scalability and security. The massive public cloud providers, who are now the driving force behind datacenter infrastructure development and spending, will play a key role in this. Kushagra Vaid, vice president and distinguished engineer for Azure infrastructure at Microsoft, noted that Azure and others, such as Amazon Web Services (AWS) and Google Cloud, will spend $6 billion or more every year on infrastructure, so with so much money going into these facilities, a premium will be placed on system utilization and cost efficiencies. It’s going to call for a rethinking of the datacenter, including the chips that are powering the systems, Vaid said.
Chris Bergey, senior vice president and general manager of Arm’s infrastructure line of business, said hyperscale datacenters are built differently than traditional facilities, from fast network fabrics and huge numbers of racks of servers and storage to highly dense hard drives and emerging technologies like smartNICs. Datacenters are an evolving environment, but “you can make the argument that the last thing that kind of has not been customized is really the CPU complex. It’s the one thing that’s been preserved in the datacenter thus far as general purpose. That’s really the driving factor of, yeah, PPA [power, performance and area] is great and, yeah, Arm power performance is compelling, but it’s what you can do,” Bergey said, noting that the presence of Arm architecture can help enterprise customize and optimize the CPU in their environments.
“The days of general-purpose is really a struggle, especially with the Moore’s Law slowing down,” he said. “Customization or optimization is really the theme in the biggest clouds. You’re going to see that relative to shrinkage. This idea of of chiplets and how you almost make the motherboard come down to a package size. We’ve done a lot of research in 3D ICs and there’s a lot of things going on. You’re going to see this shrinkage. That’s key.”
Enterprises are continuing to migrate applications and data to the cloud, but many estimates have 70% of workloads still sitting on premises, whether it’s because of security concerns, the cost of migration or compliance issues.
“The big boy workloads are still on prem and the big issue is a lot of these applications in the ecosystem, they’ve been built 10, 20, 30 years ago,” Oracle’s Batta said. “The people that actually built them aren’t around, so it’s not a matter of lift-and-shift. It’s more like you have to move and improve, which is you move the things that you have and then as it makes sense, you may modernize based on whether you want to do containerization to make that portability happen. Based on the customers that we speak to in today’s world, for us, it’s been very, very hard to find a segment of customers that have that kind of portability across different sets of environments. … We want to give customers a diversity. A lot of our customers are doing some interesting use cases that they just didn’t think were possible in a cloud before and now it gives them an ability to at least experiment. But we’re still a ways away from on prem being just one of the things that you deploy because there’s so much differences between how you deploy applications and their characteristics. Moving architectures could move your physics results, as an example, for particular customers, so we have to be really careful about that.”
For Arm’s Bergey, much of it comes down to “the efficiency of compute cycles, the ability to really optimize and make sure our compute cycles are being fully utilized. That’s where a lot of the gains have come in and how cloud computing’s been able to keep up with the demand without having skyrocketing power. Now they’ve been able to continue to push those envelopes going forward. A lot of the low-hanging fruit through virtualization and all that kind of stuff has already been harvested, so we’ve got we’ve got a big, big road ahead.”