Intel Does The Math On Broadwell Server Upgrades

Timothy Prickett Morgan

8 years ago

The “Broadwell” generation of Xeon processors debuted a month ago, and now that the basic feeds and speeds are out there, customers are trying to figure out what to buy as they upgrade their systems and when to do it. This being a “tick” in the Intel chip cadence – meaning a shrink to smaller transistors instead of a “tock” rearchitecting of the Xeon core and the surrounding electronics – the Broadwell Xeons snap into existing “Haswell” systems and an upgrade is fairly straightforward for both system makers and their customers.

It all comes down to math about what to buy and when to buy it in the cases of a tick upgrade, and with the “Skylake” Xeon E5 v5 processors not expected until late next year, there is no need to hesitate on plans for buying new systems or upgrading existing ones. But with such a broad and deep product line, as we have previously detailed, and a wide variation in price/performance across that line, as we have calculated across the past six Xeon generations, IT managers have to get out their Excel spreadsheets to do a little calculating to figure out what configurations will give them the best value in their shiny new systems.

The two-socket Xeon server is the workhorse of the datacenter, an with every successive generation of Xeon processors that come out from Intel, the chip maker crams more cores onto the die and also jacks up the performance of the cores. The single-threaded performance on integer work tends to go up between 5 percent and 10 percent with an architecture change, and the core counts tend to rise enough to get somewhere between 20 percent and 30 percent more work through the chips with a reasonable amount of tuning. On some workloads, such as floating point math, changes in the vector units over the years have allowed performance to make much larger jumps, to the point where a pair of Xeon chips can be reasonably used as the exclusive compute engine for workloads that are not embarrassingly parallel or are more sensitive to clock speeds.

That extra computing oomph that comes with a new Xeon family can be used in a number of different ways. “In the past, enterprises would swap out 25 percent of their machines every year and sometimes waterfall the older boxes down to do something else in the datacenter,” Frank Jensen, performance marketing manager at Intel, tells The Next Platform. “But now, we are starting to see faster refresh cycles.”

In some cases, customers just take the extra performance and build out their clusters by keeping the node count more or less the same or perhaps increasingly it slightly. In other cases, where the customers are constrained when it comes to power or space (or both), they will use the extra performance in a single node to cut back on the number of nodes in the cluster while at the same time boosting the performance of the overall cluster to suit their needs. We happen to think that some of that spare compute capacity will be used for network function virtualization and software defined storage, workloads that until now have resided on specialized hardware appliances. (This is certainly happening in the telco space and has long since happened at the hyperscalers and the biggest cloud builders.)

The point is, with every new Xeon generations, the options get better and the means to consolidate workloads onto fewer boxes or to radically expand the performance of a fixed cluster footprint. Jensen has done a lot of math to help customers assess the options, and shared his data with The Next Platform so we could share it with you.

As a baseline, you should review the Broadwell product line and the very rough price/performance analysis we did in the wake of the announcements. There are hundreds of Xeon variants that have been created in the past seven years, and the performance gains and relative value of the raw compute across those SKUs and generations does follow some patterns. For Xeon chips with a high core count, the performance has risen steadily and the cost per unit of performance initially came down fast but the rate of change tapered off as the cost of adding cores and the desire to have lots of cores by hyperscalers increased. For the more standard SKUs, performance has increased at half the rate as the chips with the high core count, but price/performance has improved steadily. For Xeons with high clock speeds, improvements in the instructions per clock and modest increases in clock speed have resulted in faster chips, and the price of a unit of performance came down very fast with the “Sandy Bridge” generation in 2012 but has only come down a smidgen over the past two generations. Again, attesting to the fact that it is difficult to crank up single threaded performance and that those customers in the HPC and financial services arenas that need fast threads will pay a premium for that particular flavor of performance.

Here is how Jensen thinks a private cloud based on server virtualization might play out for a customer with three racks of servers using the Sandy Bridge Xeon E5-2690 processors, which have eight cores running at 2.9 GHz, and moving up to the new 22-core Broadwell Xeon E5-2699 v4 chip, which runs at 2.2 GHz:

Based on SYSMark benchmark tests, three racks of two-socket machines could host about 111 virtual machines, according to Jensen, and by doing a simple one-to-one replacement for the servers to that top-bin Xeon would scale the cluster by a factor of 3.5X to around 396 virtual machines. By doing such an upgrade, customers would be giving Intel (rather than the server maker) a much larger portion of the budget, since these new Broadwell E5-2699 v4 processors cost more than twice as much a pop as the Sandy Bridge E5-2690s did four years ago. (That’s $4,115 for the most capacious Broadwell compared to $2,057 for the top-bin Sandy Bridge.) But it means spending less on memory, disks, enclosures, power supplies, and so on to get a certain capacity of compute, too. Customers will have to give up a little clock speed per core (about 24 percent) by moving to the heavily cored Broadwell.

While Jensen did not provide the economics for the comparison above, it will obviously be a lot cheaper to buy less than a full rack of machines to support 132 virtual machines in the lower scenario and get a modest 20 percent or so performance boost in the cluster.

Jensen put some numbers on a much smaller virtualized configuration pitting Sandy Bridge iron against Broadwell iron, however. This one is also for a virtualized workload, based on VMware’s ESXi hypervisor:

And here is the underlying data for that comparison, which was based on pricing for hardware and software effective in February ahead of the Broadwell launch:

In this example, ten of the two-socket Sandy Bridge servers are being replaced by four of the two-socket Broadwell servers, both using top-bin parts. The Sandy Bridge hardware is already paid for, but annual operating system and virtualization software licenses are still applicable on the machines and so is server maintenance and various space, power, and cooling costs also accrue to the old iron, too. While it costs $80,400 to buy four of these top-bin Broadwell machines, because the server node count is reduced by a factor of 2.5X (and the performance of each nodes should be about 3.6X higher), the software license and maintenance costs are cut by more than half, and so are the other operational costs. So the net-net is that shelling out money for new hardware can save $185,079 in licensing and operational costs while also boosting the throughput on the cluster by about 43 percent.

This is the kind of math that the hyperscalers are doing all the time, and that is why Intel is making 22-core Broadwell chips and will be making 28-core Skylake chips.

Not everybody wants to pay top dollar for compute, so Jensen worked up another total cost of ownership example pitting two different Broadwells that are lower down in the SKU stack against each other. Take a look at the comparison chart, which compares ten servers using 14-core chips to eight servers using 18-core chips, which have roughly the equivalent performance:

Here is the underlying table of numbers for that chart:

Again, each server is a little more expensive, but the software licensing and maintenance costs go down because those are based on server units (in one fashion or another) and not on the performance (where core count is a rough proxy) of the machines. In the cast above, the cores on the faster machines run 15 percent faster and there are 144 of them compared to 140 in the ten servers on the left, so this buying decision is a no-brainer. It makes far more sense to buy the machines based on the Broadwell E5-2697 v4 processors, and over the course of four years will save $73,910 in total costs, or about 13 percent.

The other thing that Intel wants to encourage companies to do – and we have a hard time believing that they do not do this already – is to fully populate the sockets in the Xeon servers. The incremental cost of that extra processor is negligible, and once you burden it with an operating system and compilers the price/performance of the machine with two sockets populated is far better. Take a look:

As you can see, it only costs $930 to add the second ten-core Broadwell E5-2630 v4 processor to the server above. Intel did not double up the main memory, which it probably should have. That might raise the price another $2,000 using 32 GB memory sticks. Even still, that is nearly twice the performance for about 50 percent more money.

In general, in case you have not gotten the message, Intel wants to sell you two of its fastest processors for your new servers. Period. But it will be happy for you to buy chips with fewer cores and more of them, too, if you have your reasons for going that way.

Here is another handy general guide to compare across the Xeon generations by workload that Jensen put together:

These numbers jibe, more or less, with the more generic relative performance ratings that we cooked up in our Nehalem to Broadwell analysis from a few weeks ago. Performance gains will vary by workload, as you can see. But clearly Sandy Bridge machines that are four years old will see a substantial benefit even if they stay in the same rough SKU band, which is what this table is showing. There is less of a technical incentive to upgrade “Ivy Bridge” machines from three years ago and even less of one to upgrade from “Haswell” systems that might only be a year to a year and a half old at most customer sites. The delta in performance for Broadwell compared to these machines is not enough to compel customers to move, and even the hyperscalers do not and cannot replace all of their machines with every Xeon product cycle.

No one is rich enough to do that. Not even Google or Microsoft or Amazon Web Services or Facebook.