Login / Signup
User-level failure detection and auto-recovery of parallel programs in HPC systems.
Guozhen Zhang
Yi Liu
Hailong Yang
Jun Xu
Depei Qian
Published in:
Frontiers Comput. Sci. (2021)
Keyphrases
</>
failure detection
massively parallel
computer systems
user interaction
complex systems
data warehouse
abstraction levels
intelligent systems
higher level
user interface
user experience
fault detection
user profiles
high performance computing
fault tolerance
distributed systems
fuzzy logic