Nvidia Adds Cluster Management To Its Enterprise Stack

Chip maker Nvidia might be best known for its graphics and datacenter compute engines, but the company has made no secret of its aspirations to be a bigger player across the datacenter. And not just in hardware, but in software, too.

To that end, Nvidia has acquired Bright Computing, for an undisclosed sum. Bright is a maker of software that controls the configuration of clustered systems such as its own DGX servers, their HGX clones made by OEMs and ODMs, as well as clusters based on servers from other manufacturers. Nvidia will be adding Bright Cluster Manager to its Enterprise Products Group. The acquisition gives Bright Computing, which is a relatively small but well-known software company predominantly in the HPC simulation and modeling space, a much wider and deeper partner channel through which to sell Bright Cluster Manager. And it gives the Nvidia enterprise hardware and software stack some much-needed tooling to help customers better manage the Nvidia iron they acquire.

The exact size of Bright Computing in terms of employee count, revenue, and profits is not known because the company is privately held, but we do know that since its founding in 2009 in Amsterdam, Bright Computing has sold its tools to more than 700 organizations worldwide – most of them HPC centers, but in more recent years enterprises that need help configuring and managing their distributed systems as well as organizations that are setting up clusters for AI training have acquired Bright Cluster Manager to deal with the low-level. In 2019 alone, Bright Computing added more than 100 new customers, to give you a sense of the rate of change of its installed base. We can only imagine that the business has accelerated along with HPC and AI adoption in the wake of the coronavirus pandemic. Bright Computing got $2.5 million in funding from ING Corporate investments in 2010, and took down another $14.5 million in funding from Draper Fisher Jurvetson, Prime Ventures, and ING Corporate Investments in 2014, which helped propel the company to increase its installed base to over 400 companies that year.

Bright Computing moved its headquarters to San Jose many years ago to be closer to the Silicon Valley software community. It has not only hooked into HPC stacks to configure the servers and networks comprising distributed systems, but has also been extended to configure clusters running the Hadoop data analytics platform, the Spark in-memory processing platform, the OpenStack cloud controller, and – in the Bright Cluster Manager 9.1 release that came out a little more than a year ago – the entire VMware stack, including the Tanzu Kubernetes container platform. The cluster controller can be used on premises or on public cloud infrastructure from Amazon Web Services or Microsoft Azure – importantly mirroring the way Nvidia is bringing its AI and HPC stacks to market. Nvidia’s recent but tight partnership with VMware, which is seeing the two work on running VMware’s ESXi hypervisor on Nvidia’s DPUs as well as tuning up the VMware virtual server substrate to run Nvidia’s AI and HPC stacks, is particularly relevant given how Bright Computing has recently also thrown its weight behind the VMware stack.

Bill Wagner, who replaced company co-founder Matthijs van Leeuwen as CEO at Bright Computing in 2016, has been looking for a way to make Bright Computing as well-known to the AI set as it is to HPC customers, and this acquisition by Nvidia will do this with the sweep of two pens.

“We needed to be much bigger, and we have got a great product that should be in the hands of far more customers,” Wagner, who is staying on to run the Bright Computing business within Nvidia, tells The Next Platform. “Most HPC shops know who we are, but very, very few companies outside of HPC know anything about us, but for enterprises to be able to deploy AI and edge systems, they are not going to have the expertise and they are going to need a tool like Bright Cluster Manager. So the timing is great for us to join Nvidia because they can deliver the market reach and market awareness and we will be able to max this out.”

Martijn De Vries, co-founder and chief technology officer, and the entire Bright Computing team is moving over to Nvidia, and the software unit will be under the charge of Charlie Boyle, general manager of the DGX systems business at Nvidia. The developers, operations, sales, and other employees at Bright Computing will join their respective teams inside of Nvidia. And, as we discussed recently with Nvidia co-founder and chief executive officer Jensen Huang, 75 percent of the employees at Nvidia work on software, not hardware, so the Bright Computing team will “snap right in,” as Boyle put it to us. He added that Bright Cluster Manager would stay with a subscription model in terms of pricing, complementing the AI Enterprise stack from Nvidia and the support subscription model for the DGX systems. Moreover, Boyle promises that Nvidia will support other CPUs and GPUs – based on customer demand, of course.

Boyle expects that Nvidia will have updates on the Bright Cluster Manager product roadmap at the upcoming GPU Technical Conference in March, and we also expect to see some ideas about how the cluster manager will integrate with the Base Command data preparation and machine learning training run management software underneath AI Enterprise, which is a workflow for doing machine learning training and then creating inference models, as well as the Fleet Command orchestration and system management tool that Nvidia created to run AI Enterprise out at the edge. (We covered both of these last year here.)

Here is the way to think about this. Nvidia needed a tool to help configure the clusters and set them up to run its various stacks, which have many workflows that include not only Nvidia software, but also other software that is either homegrown or from third parties. The one thing that Nvidia still needs – and that its GPU-accelerated system customers are going to demand – is a disaggregation and composability layer that can dynamically configure GPUs to CPUs. Something like, for instance, the FabreX PCI-Express composable interconnect from GigaIO, which we have written plenty about here at The Next Platform. And as Wagner tells us, isn’t it convenient that GigaIO is already an Nvidia partner and already worked out points of integration between FabreX and the Bright Cluster Manger 9.2 release that previewed last November and that will be available in early 2022?

Another acquisition by Nvidia could be in the works.

Nvidia Adds Cluster Management To Its Enterprise Stack

Sign up to our Newsletter

Be the first to comment

Leave a Reply Cancel reply

Sign up to our Newsletter

Related Articles

The “Hopper” GPU Compute Ramp Finally Starts

The Buck Still Stops Here For GPU Compute

Groq Says It Can Deploy 1 Million AI Inference Chips In Two Years

Be the first to comment

Leave a Reply Cancel reply