Securing The HPC Infrastructure

In the world of high performance computing (HPC), the most popular buzzwords include speed, performance, durability, and scalability. Security is one aspect of HPC that is not often discussed, or else seems to be relatively low on the list of priorities when organizations begin building out their infrastructures to support a demanding new applications, whether for oil and gas exploration, machine learning, simulations, or visualization of complex datasets.

While IT security is paramount for businesses in the digital age, HPC systems typically do not encounter the same risks as public-facing infrastructures – in the same way as a web server cluster, for example – and so the security risk is perhaps not perceived to be so great. Performance is another overriding concern when designing and deploying HPC systems, and running security services in addition to the application workloads may have some impact on overall performance, which could lead to IT complications and operational inefficiencies.

However, there is a growing realization among these organizations that the actionable intelligence generated by these applications is potentially very valuable information. HPC-driven insights could be the intellectual property that the organization depends on to generate revenue, or a key insight that leads to new advances in medical research or materials science, accelerate innovation, bolster national security, or facilitate economic growth.

Wherever there is valuable data, there are criminals or other agents who will expend a great deal of effort to acquire such information, as we have seen in innumerable cybersecurity attacks around the world. Even organizations that should have staunch security policies in place, such as US government defence contractor Lockheed Martin, have experienced devastating network breaches.

A particular area of concern that has emerged in recent years is the security of the hardware itself. Some attacks are now able to exploit weaknesses at the hardware or firmware level, and these can prove difficult to remediate as they execute below the level of the operating system or hypervisor.

In a recent report from consulting and research firm Moor Insights & Strategy, senior analyst John Fruehe explains that hardware and firmware exploits are capable of hiding for a long time before being discovered, and can be pre-programmed to access resources at a predetermined date in the future.

These attack vectors are particularly insidious because they enable hackers to target weak points in the supply chain in order to introduce malicious code before the machine even reaches the customer. According to Fruehe, Cisco has reportedly suffered intercept issues with shipments of equipment that were accessed and altered in transit, while Supermicro servers in the Siri development lab at Apple were found to have malware.

Actions are now being taken to address these issues. For example, Google has modified its approach by specifying its own custom security chips. These chips are fitted into the hardware, which enables users to identify and authenticate the machines as legitimate Google systems and ensures that they have not been tampered with. Google reportedly uses cryptographic signatures over low-level components like the BIOS, bootloader, kernel, and base operating system image, and these signatures can be validated during each boot or update.

Fortunately for those in the HPC community who do not have Google’s level of purchasing power to specify their own custom-built hardware, others in the industry are adopting similar approaches – such as Hewlett Packard Enterprise with the introduction of the new Gen10 X86 server systems.

HPE’s Gen10 portfolio includes the Apollo 6000 series, a robust line of servers that are expressly designed for HPC customers and leverage a 12U rack-mount chassis that holds up to 24 server trays fitted horizontally. The Gen10 server trays are based on the new Intel Xeon Scalable Processor family, launched this week, but also feature what HPE has christened its Silicon Root of Trust. The Silicon Root of Trust starts with the HPE iLO management controller, which is the enterprise’s own custom-built baseboard management controller (BMC) chip.

When the system boots up, the iLO device is initialized first, and it scans the system firmware, comparing it line by line with a trusted copy of the code that is stored in a secure location. Once the code is verified, or found in an unmodified state, the iLO releases the processors on the motherboard from their reset state and allows the system boot to continue.

But HPE takes the pre-boot checks a step further. If the iLO discovers that the firmware has been altered in any way, it enters a recovery mode whereby it can restore the firmware to a previously authenticated state, or to the version in which it left the factory. In addition, HPE develops and maintains its own firmware rather than using off-the-shelf third party code, leveraging a strict process for signing off changes to ensure its integrity.

Once the firmware is authenticated, UEFI Secure Boot ensures that only verified firmware components and operating system bootloaders with the appropriate digital signatures can execute during the boot process. Each component executed at this stage must be digitally signed and the signature validated against a set of trusted certificates embedded in the firmware itself.

In addition to providing end-to-end security, HPE’s iLO chip establishes a foundation for further security features, including FIPS mode which implements encryption algorithms validated to the U.S. government FIPS 140-2 requirements. This feature also blocks insecure communication channels (such as IPMI and SNMP v1) that do not meet standard safety protocols, in order to reduce the potential attack surface of the HPC systems.

At the highest level of security, iLO implements Commercial National Security Algorithms (CNSA) mode, which uses only the strongest cryptographic algorithms and secure protocols. This mode is highly specialized and is only available with iLO Advanced Premium Security Edition licensing.

Beyond the server itself, other components of the IT infrastructure also provide added layers of protection—this includes storage systems. Storage-level security can lead to complications as far as HPC deployments are concerned, since the performance of the storage layer is critical for executing demanding applications quickly and efficiently. Any delays in data processing, access, or transmission will not gain any friends in the HPC community.

In theory, data encryption can be performed with zero overhead through the use of Self-Encrypting Drives (SEDs), which handle the encryption and decryption in hardware at the drive level, or through the use of controller hardware that implements automatic encryption at the array level. Yet both of these methods introduce additional complexity into the storage layer, and may require additional capabilities such as a key management system in order to monitor and maintain.

But all the encryption in the world is of little use if attackers can compromise the hardware itself, placing malicious code into a privileged position where it can lurk undetected and gain unrestricted access to all system resources.

HPE’s secure IT developments are essential to ensure that the lowest level of hardware can be a secure and trustworthy foundation for the entire HPC infrastructure. HPE in particular has invested significant time and resources into its Secure Compute Lifecycle, and is confident that the new HPC Gen10 systems are the most secure industry standard servers on the market – from supply chain, system operations, and data flow, all the way to end-of-life safe disposal.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

1 Comment

  1. Part of having secure hardware also means that the hardware itself, or critical components of it, must be sourced, manufactured and assembled in countries that do not mandate backdoors(China, UK etc.) or have features which bypass their own hardware level security like the black box “CPU within a CPU” out of band management systems so common in x86 CPUs.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.