One Small Step Toward Supercomputers in Space

Nicole Hemsoth Prickett

7 years ago

While it is not likely we will see large supercomputers on the International Space Station (ISS) anytime soon, HPE is getting a head start on providing more advanced on-board computing capabilities via a pair of its aptly-named “Apollo” water-cooled servers in orbit.

The two-socket machines, connected with Infiniband will put Broadwell computing capabilities on the ISS, mostly running benchmarks, including High Performance Linpack (HPL), the metric that determines the Top 500 supercomputer rankings. These tests, in addition to the more data movement-centric HPCG benchmark and NASA’s own NAS parallel benchmark will determine what performance changes, if any, are to be expected when bringing more compute to bear in space.

As HPE’s Mark Fernandez tells The Next Platform, the duo of HPE Apollo 50 machines is direct from the factory—in other words, no hardware hardening against radiation and magnetic disturbances has happened. Most of his team’ work has focused on the many tunable parameters for the CPU, memory, and solid state disk drives that are aboard ISS. What is different are the “lockers” that HPE built and tested for flight against the over 140 safety certifications required for on-board ISS gear.

Below is a phot of the locker designed and built by HPE to house the servers, network, and storage gear. Once installed in the NASA ISS Express Rack, this is the face that will be visible and to which the astronauts have access. Astronauts will connect the electrical power, the Ethernet for networking and the chilled water for cooling.

The locker is 36.6 L x 21.5 W x 10.0 H and weighs 124 lbs. on Earth. On the far right is a column of connections with cables attached. Those are the six (6) 110 VAC electrical power cords. Three primary and 3 redundant. On the top right, adjacent to the top power cord is a standard Ethernet port for networking. The backup Ethernet port is in a similar location adjacent to the bottom power cord. Its view is somewhat obscured. The red item pictured is the chilled water outlet connection point. The chilled water inlet is the shiny right-angled piece of metal shown just above it.

The power and cooling situation provides some interesting “freebies” for operating a more powerful system. ISS has extensive solar arrays to provide the power, but these come in at 48 Volt DC that NASA provided 110 Volt inverters for. Inside the locker pictured above is a heat exchanger that connects to a standard chilled water loop in ISS that allows the systems to kick 75% of the heat to water. The warm air that blows over the heat exchanger is pushed into space.

(Click to enlarge) A different take on a rack. The structure of the server and its power and cooling units.

The system itself is running standard RHEL 6.8 across its benchmark suite and has features common to much larger supercomputers, including the Infiniband connections. “We went with the 56Gb/s optical interconnect because we imagined with copper, we would get more of a reaction from the radiation and magnetic fields. We also eliminated the spinning rust—there is no traditional hard disk because it would be affected by the same conditions. On each node there are eight solid state disks; four of those are small but fast, the others are large but slow so we can see what effects there might be on one versus the other,” Fernandez explains.

Overall, the miniature space supercomputer is capable of a teraflop of performance—an order of magnitude above anything that is aboard ISS currently. While it is far from a Top 500-class system (after all, this is just two nodes), Fernadez says he can see a future where they scale this to a large number of nodes for more ISS compute capability. The goal for now, however, is determine what (if any) effects real-world applications will suffer in terms of errors in particular.

“We are taking a macro look at hardening a system for space conditions. Traditional hardening looks at a specific type of radiation or magnetic field then analyzes the physics to see what components might be affected to guide the protection or build strategy,” Fernandez says. “There are many knobs to turn that we usually don’t touch in the BIOS, CPU speeds, memory, and turbo modes.” He adds that there are parallel systems at the company’s labs in Chippewa Falls, Wisconsin that are serving as the control group to compare HPL and other results with. The systems are running tests in 2.5 hour increments constantly and will continue for the next year.

Through the SGI acquisition, Hewlett Packard Enterprise has a longstanding, 30-year relationship with NASA. This relationship started the co-development of the world’s first IRIX single-system image in 1998. Along the way, we’ve achieved great milestones, including the co-development of one of the largest and fastest supercomputers, Columbia, a 10,240-processor supercluster that was named the second fastest supercomputer in the world on the 2004 Top500 list. Today, the Spaceborne Computer contains compute nodes of the same class as NASA’s premier supercomputer, Pleiades, currently ranked #9 in the world.

In terms of performance, there is no overclocking for these systems. “Sometimes in HPC we want these machines to run as fast as possible and we may not give the hardware time to do the error detection and correction we are looking for here. Considering the systems on-board the ISS now, we think even if we slow down these Apollo machines they will still be able to outperform any other on-board systems.”

“This is a baby step toward general purpose supercomputing on board,” Fernandez concludes. “We want these systems to run and not fail and consistently give the right answers. We are interested in performance time, which is why we will be running these machines on ISS 24/7, 365 days to see how far they go.”