Assessing HPC Failure Detectors for MPI Jobs.
Kishor KharbasDonghoon KimTorsten HoeflerFrank MuellerPublished in: PDP (2012)
Keyphrases
- high performance computing
- message passing interface
- scientific computing
- massively parallel
- parallel computing
- parallel machines
- object detection
- grid computing
- computing systems
- job scheduling
- computing resources
- fault tolerance
- computational grids
- processing times
- parallel implementation
- energy efficiency
- message passing
- computing infrastructure
- flowshop
- computing environments
- identical parallel machines
- parallelization strategy
- batch processing
- failure rate
- shared memory
- anomaly detection