Login / Signup
Characterizing and Understanding HPC Job Failures Over The 2K-Day Life of IBM BlueGene/Q System.
Sheng Di
Hanqi Guo
Eric Pershey
Marc Snir
Franck Cappello
Published in:
DSN (2019)
Keyphrases
</>
fault tolerance
high performance computing
root cause
data sets
databases
daily life
future development
failure rate
batch processing