Who Shoulders the Supercomputing Resiliency Burden?
While the related topics of fault tolerance and resiliency do not garner the same attention as performance and efficiency, being able to recover from and work around failures, especially as applications take over ever-large and increasingly heterogenous machines, is more important than ever. …