Login / Signup
Examining Failures and Repairs on Supercomputers with Multi-GPU Compute Nodes.
Amir Taherin
Tirthak Patel
Giorgis Georgakoudis
Ignacio Laguna
Devesh Tiwari
Published in:
DSN (2021)
Keyphrases
</>
parallel computing
parallel programming
shortest path
network structure
graphics processing units
graphics hardware
parallel architectures
high performance computing
parallel algorithm
directed graph
graph structure
parallel implementation
massively parallel
distributed memory
failure detection
repair actions