MUST: A Scalable Approach to Runtime Error Detection in MPI Programs.
Tobias HilbrichMartin SchulzBronis R. de SupinskiMatthias S. MüllerPublished in: Parallel Tools Workshop (2009)
Keyphrases
- error detection
- error correction
- error recovery
- fault tolerance
- data cleansing
- error correcting
- message passing
- error control
- parallel algorithm
- error resilient
- fault isolation
- parallel implementation
- high performance computing
- parallelization strategy
- massively parallel
- general purpose
- computer programs
- message passing interface
- parallel computing
- neural network