MPI Runtime Error Detection with MUST: A Scalable and Crash-Safe Approach.
Joachim ProtzeTobias HilbrichMartin SchulzBronis R. de SupinskiWolfgang E. NagelMatthias S. MüllerPublished in: ICPP Workshops (2014)
Keyphrases
- error detection
- error correction
- error recovery
- error correcting
- data cleansing
- fault tolerance
- parallel algorithm
- message passing
- parallel implementation
- error control
- error resilient
- high performance computing
- general purpose
- fault isolation
- intelligent systems
- artificial intelligence
- parallelization strategy
- parallel computing
- parallel processing
- fault tolerant