Login / Signup
Optimizing Checkpoint-Restart Mechanisms for HPC with DMTCP in Containers at NERSC.
Madan Timalsina
Lisa Gerhardt
Nicholas Tyler
Johannes P. Blaschke
William Arndt
Published in:
CoRR (2024)
Keyphrases
</>
fault tolerance
high performance computing
fault tolerant
response time
random walk
digital libraries
scientific computing
data sets
neural network
case study
image sequences
markov chain
massively parallel
mechanisms underlying
computational science