Oracle Makes Its Prognostications For HPC In The Cloud

For Karan Batta, the much-talked-about wide adoption of cloud computing in the HPC space is really like a game of dominos. You just need that one company – or maybe a few – to broadly embrace the cloud and migrate many of their mission-critical workloads to a public cloud environment and other organizations that have been on the sidelines toying with the idea or even trying out a few applications in the cloud will follow. And he believes the time is near.

“HPC has been ready for a long, long time. The problem is, it’s like nobody wanted to play,” Batta, vice president of product for Oracle Cloud, tells The Next Platform. “It was, who will blink first? You need just the first, second, third domino to fall for the entire industry to move forward. What I can tell you is we are having conversations with dozens of large factory customers. We see the engagement happening a lot more in manufacturing than we are in other [areas]. The reason for that is it requires a very specific focus. In HPC, every industry has their ecosystem of partners and applications. As we’ve seen, in financial services it may be different HPC than you see in manufacturing. Our focus has been opening up the manufacturing sector and oil-and-gas sector. Manufacturing has been our core focus to prove out the model that at least what we’re doing is working. From a manufacturing standpoint, we’re very, very strong.”

HPC and the cloud is an area that Batta has been pursuing for years. Before coming to Oracle in 2017 to help grow Oracle Cloud, he spent more than three years at Microsoft, work to build out its Azure public cloud services business that over the years has grown to be number two in the market, behind the dominant Amazon Web Services (AWS). Before that was four years with startup GreenButton Limited, which was bought by Microsoft in 2014.

As The Next Platform readers know, the promise of organizations running their HPC workloads in the cloud has been discussed for the past several years and the public cloud services giants like AWS, Azure and Google with various levels of success have been pushing to be the provider of choice. Microsoft for several years has run Cray XC supercomputers and CS Storm clusters in the public cloud. AWS says a broad array of HPC workloads – from traditional applications like computer aided engineering, weather prediction, genomics and computational chemistry to emerging workloads like machine learning and autonomous driving – are running on its infrastructure and Google Cloud also promotes itself for HPC organizations. They all boast such organizations as BP, HSBC, Northeastern University and Bristol-Myers Squibb as customers.

Market forecasts look promising. Hyperion Research late last year predicted that cloud spending for HPC would grow at an average of 24.6 percent a year over the next five years, hitting $4 billion this year and more than $7 billon by 2023. A few years ago the growth rate sat at about 10%. A survey for ANSYS predicted cloud usage for engineering simulation more than doubling over the next year, with agility and cost driving the increase in spending.

Hurdles for HPC on Cloud

The challenge is that for many companies running HPC workloads, they are looking for a cloud environment that doesn’t demand a tradeoff between what they find in their own datacenters and what they can get from the cloud, Batta says. At Oracle, he and his team are looking to offer the technologies and services that deliver just that.

“The reason why we’re doing this is because a lot of our customers that are enterprise customers are still on prem,” Batta says. “They’re sort of dabbling in the cloud. They’re building some chatbots or doing something interesting, but really, their systems and their fundamental products are still on prem. The reason for that is because they want the benefits of on prem. They want the performance or the specialization that gives them, but they also want the benefits of the cloud. They want flexibility, they want pay-per-use, they want all these other things. They want the benefits of both but they want downsides of none, essentially.”

Oracle Cloud Roadmap

That was the theme that ran through the announcements Batta and other Oracle Cloud officials made this week around the company’s product roadmap. Oracle Cloud next year will begin offering HPC compute instances powered by Intel’s 10 nanometer “Ice Lake” Xeon processors to accelerate the performance of such workloads as crash simulations, computational fluid dynamics (CFD) and EDA by more than 30 percent over the existing X7 HPC instances. Organizations will be able to run these instances as bare metal, leverage NVMe storage, build clusters via Oracle’s RDMA-enabled cluster network and use a balanced core-memory ratio.

Oracle, which earlier this year announced it was working with Nvidia in building the next generation of GPU instances, said this week that those instances will be generally available starting Sept. 30 in the United States, Japan, EMEA and Asia-Pacific regions for $3.05 per GPU hour. The instances offer up to 1.6 TB/sec of bandwidth for each bare-metal node, each of which will house eight A100 Tensor Core GPUs. Enterprises will be able to scale up to 512 GPUs in a single cluster, which is aimed at HPC workloads and AI training applications and will be interconnected via NVLink. Oracle also is allowing for such emerging technologies as GPUDirect RDMA, which enables network devices to bypass the CPU and directly access GPU memory and supports Nvidia’s GPU Cloud. The new instances have been in customer preview for the past few months.

Oracle will use Ampere Altra processors for its first Arm-based instances, which Batta says will deliver better price-performance than X86 processors like those from Intel. The instances will launch early next year on bare metal or virtual machines with up to 160 cores and support for myriad Linux distributions, including Oracle Linux and Ubuntu.

“A lot of our enterprise customers, because they’re running on our cloud, they do a lot of IoT [Internet of Things] and agile mobile development,” Batta says. “For that, we need Arm processors in the cloud. Let’s say, for example, you want to theoretically test an application that’s being deployed on Android. You want to test it on a million devices. You can’t do that today. But now you can launch a million cores of Arm on our cloud and you’d be good to go. The price-performance is going to be orders of magnitude much better than some of the other general-purpose platforms out there.”

The Arm instances also will be part of Oracle’s program that enables organizations to choose the number of cores and level of memory they want based on their workloads. The flexible infrastructure feature was first introduced in April with Oracle’s E3 instances, leveraging AMD’s second-generation Epyc datacenter chips. Also coming early next year will be Oracle’s E4 instance using AMD’s next-generation “Milan” Epyc processors.

Oracle also is making Rescale’s cloud simulation platform available on its cloud infrastructure to make it easier for enterprises to onboard their workloads, which can be up and running within a day. Rescale has more than 450 applications already installed on Oracle HPC instances.

Enterprises Join Up

Batta says the new offerings are the latest efforts to give enterprises a cloud infrastructure that delivers the goods from their on-premises datacenters as well as the cloud and can get those dominos falling. The company in August announced what it hopes will be one of those dominos in Nissan Motor Co. The car maker is moving its engineering simulation workloads – including CFD and structural simulation applications – to Oracle Cloud, using its Intel- and GPU-based instances.

“I’m not talking like a few thousand cores here or there,” he says. “These are like 20, 30, 40, 50-plus thousand cores running 24/7, 365 a year. They’re essentially running their car design and car development platform on top of those. What that means is all the aerodynamics. From an experience point of view, the car designer and the engineer on the desk doesn’t see anything different. They submit a job, it just ends up in Oracle Cloud, it just looks like they’re clustered. This is what I mean by benefits of on prem and the cloud as well. This was a competitive win against the other cloud providers, which is a big deal.”

It’s also part of a larger trend, he says, adding that there are other large enterprises – including U.S. and European automotive companies – that are moving to Oracle Cloud Infrastructure but that he can’t name. But Nissan is a good proof-point of what the cloud can do for major companies, with the auto maker “actually keeping the data in the cloud. They’re actually running the simulation and they’re moving that data to GPU instances and doing visualization. As a provider, this is somebody doing the entire end-to-end HPC use case in the cloud. They’re doing it on a large scale.”

Altair, an Oracle partner that provides solutions in such areas as data analytics, product development and HPC, this week said it will run many of its internal workloads and commercial software-as-a-service (SaaS) solutions – including engineering simulation and analytics products – on Oracle Cloud. This means that its 11,000 customers – in such sectors as manufacturing, automotive, life sciences and financial services – can run Altair products on Oracle Cloud.

Oracle Cloud In The Market

What all this does for Oracle Cloud as far as where it stands in a crowded and competitive public cloud market remains to be seen. The COVID-19 pandemic has fueled rapid growth in cloud revenue as enterprises adapt to new business models that include widely distributed workforces. In the global datacenter hardware and software space, it was public cloud providers that continue to prop up the market. In the second quarter, the entire space saw a 7 percent year-to-year increase; for public cloud infrastructure, that number was 25 percent. Traditional datacenter infrastructure revenue fell 3 percent. Cloud infrastructure services for the first time passed $30 billion in Q2, led by AWS, Azure and Google. Oracle is mixed in with a host of other cloud companies like Salesforce and Tencent.

In its fiscal first quarter results announced earlier this year, Oracle reported cloud services and support revenues were up 2 percent, reaching $6.9 billion.

Batta says the company doesn’t see itself as always a direct competitor of the big public cloud players, who Oracle also partners with at times. For example, Oracle and Microsoft announced a high-profile cloud interoperability partnership last year that allows enterprise to move and run workloads across Azure and Oracle Cloud. What Oracle is trying to do is offer services that the others may not offer in sectors like HPC, which he said is underserved.

“Our focus has been moving hard-to-move enterprise work. What that means is that we don’t have to go directly competing against our competitors. We compete against them by doing things they don’t,” he says. “HPC for one. We do things differently because we listen to our on-prem customer base. Case in point number two is a lot of those customers have a multicloud strategy, so we build our platform with open standards. Case point in three, we are collaborating with the cloud providers. We announced a partnership with Azure. It sometimes makes sense to run things in Azure and it makes sense to run things in Oracle. Your choice.”

That’s what makes the embrace of open standards important, particularly given the trend toward hybrid cloud and multicloud strategies, Batta says. Oracle’s strategy is “collaborating when it makes sense for the customer, but it’s also competing. It’s not competing directly against Amazon or Azure or Google. It’s competing for the workloads that aren’t necessarily moving to the cloud today. If you’re a born-in-the-cloud startup today, you can come to us, but the likelihood is you’ll probably go to another provider. … There’s a real differentiator there among all cloud providers. We’re focusing on enterprise.”

Oracle got a late start in the cloud after founder and then-CEO Larry Ellison – who now is the company’s chief technology officer – dismissed the cloud as a passing fad. However, the company is all in on the cloud now and spending a lot of money on its efforts. That included putting a team in place in Seattle about four years ago to focus on HPC and the cloud. The group has quickly put in place a strategy that is paying off with customers like Nissan. Now it just needs to hope that its HPC push can help it push back against the likes of AWS and Azure it what looks like a part of the public cloud space with high-growth potential.