Mellanox Technologies has become a leading provider of networking hardware, particularly at the high performance end of the market where the company accounts for more than 70 percent of Ethernet ports shipped with speeds above 10 Gb/sec according to Crehan Research.
It’s a comment on the success of Mellanox and on the growth of high-end networking that Intel is now reported to be interested in buying Mellanox, to bolster its data-center portfolio. But why?
At the performance level of 10 Gb/sec and above with Ethernet, the processing overhead required to serve the network interface starts to become an issue – a fact exacerbated by the increasing complexity of modern-day data center networks which now may include support for performance acceleration techniques, virtualization, and overlay networks. One answer has become to offload some of the processing to the network interface controller (NIC), an area Mellanox has devoted a great deal of development in recent years.
Offloading functions is not a totally new idea, but it has taken on a new dimension as data-center operators, led by the big cloud service providers, have embraced software-defined networking (SDN) to make the job of changing and managing their networks easier and less expensive.
SDN, in particular, has demanded the NIC become smarter so that it can be reconfigured to support new functions or protocols – as and when necessary. This has led to the emergence of an array of devices termed SmartNICs.
It’s clear that a SmartNIC must offer more network processing features and intelligence than a standard, basic NIC. But the whole arena of NICs with offload hardware has yet to be precisely defined, meaning there’s no commonly accepted definitions for what capabilities must be supported in order for a device to be marketed as a “SmartNIC.”
If we look at Mellanox and how it categorizes this field, we can see features such as 25 Gb/sec and 50 Gb/sec Ethernet support, stateless TCP/IP acceleration and SR-IOV are now regarded as a basic capability that any NIC in a modern data center ought now to support as standard. Mellanox regards such NICs as foundational devices, something that lack any real ability to offload packet steering or traffic flows from the host processor.
The step up from here is devices that employ a programmable data plane, thereby allowing the packet switching rules and data-processing protocols they accelerate in hardware to be updated or changed as required rather than be set in stone – as in legacy networking hardware. They should also be able to accelerate a range of functions common in cloud data centers – such as ensuring quality of service levels, and flow reporting and monitoring. Mellanox calls these Intelligent NICs.
The most capable NICs are those used to offload a whole slew of functions from the host processor. These include control-plane functions, such as those found in a virtual switch, and network function virtualization (NFV) in addition to the data-plane functions and capabilities mentioned earlier. The key is to deliver both high performance and flexible data processing capabilities. These devices are universally regarded as SmartNICs.
Here, it’s worth looking at how acceleration of network functions has actually been implemented in the equipment defined as Intelligent NICs or SmartNICs. For the most part, they fall into one of three categories: those based on application-specific integrated circuits (ASICs), those that feature a field-programmable gate array (FPGA), and those built with a system-on-chip (SoC) that combines one or more CPUs with the standard NIC functions.
Which of these is superior for those looking to employ hardware offload? Each has pros and cons, meaning the act of choosing can be a trade-off between considerations like cost, ease of implementation and flexibility. In other words, there’s no simple “right” answer to which is “best” as it all depends on the specific requirements of the implementation.
ASICs are a tried and tested means of accelerating specific functions and typically deliver a high level of performance at relatively low additional cost. They have, however, relatively limited flexibility, being based largely on functions built into the silicon during manufacturing. Mellanox designs its adapter ASICS with a programmable data plane so the packet switching rules and routes can be accelerated in hardware and updated on-demand, but the networking control plane must still run on the server’s CPU or on a separate network controller appliance.
Another approach has been to use FPGAs, which offer good performance and their logic blocks can be reconfigured to support new functions after being deployed. The challenge here is that FGPAs are not necessarily the easiest things to program, with the process essentially coming down to defining logic circuits and chip gates to be burned into the FPGA– which can be a relatively costly and time-consuming process every time a change is needed. In addition, FPGAs are expensive as compared to dedicated ASIC logic which can perform the same tasks.
The SoC approach offers the greatest flexibility since they integrate both an already intelligent NIC with industry standard programmable processor cores and run standard operating systems and application infrastructure programs. CPU cores can easily be reprogrammed with additional features as often as required and can be used to perform any combination of network processing tasks. This means SoCs can offload most of the network processing from the host server and thus allow it to dedicate more of its CPU cycles to running applications and services, which is where the cloud providers make their money, after all.
Some of these SOCs – such as the Mellanox BlueField – include both ASIC-based packet switching for the fastest networking performance and CPU cores that are easily programmed to add new networking features and/or run the networking control plane. This also allows the SmartNIC to run the policy engine which provides better control and isolation from potentially malicious apps running on the main CPU.
For this reason, it seems reasonable to draw a distinction and say that the only true SmartNICs are those that combine intelligent NIC offloads with use standard onboard CPU cores to accelerate network functions. This is not to say that such SmartNICs are going to totally replace the other technologies – for example, ASICs are highly efficient at handling network offloads, it’s just ASICs are not quite as flexible. The term SmartNIC, however, is also increasingly being viewed as applying specifically to the most flexible and capable accelerators, which means those with integrated CPU cores.
Another distinction: a SmartNIC can be easily programmed using a standard language such as C, in order that it can be quickly updated to take on new functions. Such new functions include the ability to handle virtual network overlay protocols like VXLAN or NVGRE, virtualizing networked storage resources so they appear to be local to the host processor, or operating security functions such as an Intrusion Prevention System (IPS) at the level of the NIC. It might even include storage-related features such as compression, deduplication, RAID, or storage virtualization.
SmartNICs grew out of the needs of the massive data centers of hyperscale web companies and cloud operators, as they shifted from delivering network functions using purpose-built hardware to features in software running on multiple server nodes that are easier to scale and manage. Where the hyperscalers lead, enterprises are likely to follow and design more private, public, and hybrid-cloud datacenters.
Definitions are still loose, but the key takeaway should be this: that there exists a spectrum of network offload capabilities that span standard NICs, Intelligent NICs that can accelerate data-plane tasks, and programmable SmartNICs that can offload a wide range of tasks such as data-plane and control-plane functions.
Each has its own use cases – you just need to identify yours.