• Adaptive Fault Tolerance Strategies for Large Scale Systems 

      George, Cijo (2018-03-07)
      Exascale systems of the future are predicted to have mean time between node failures (MTBF) of less than one hour. At such low MTBF, the number of processors available for execution of a long running application can widely ...