Diagnosis of Performance Faults in LargeScale MPI Applications via Probabilistic Progress-Dependence Inference.
Ignacio LagunaDong H. AhnBronis R. de SupinskiSaurabh BagchiTodd GamblinPublished in: IEEE Trans. Parallel Distributed Syst. (2015)
Keyphrases
- fault diagnosis
- model based diagnosis
- fault detection
- multiple faults
- bayesian networks
- belief networks
- inference process
- fault detection and diagnosis
- independence assumption
- message passing
- probabilistic reasoning
- probabilistic networks
- fault model
- factor graphs
- bayesian reasoning
- general purpose
- probabilistic inference
- logical inference
- probabilistic model
- bayesian inference
- fault detection and isolation
- expert systems
- bayes nets
- variable elimination
- medical diagnosis
- root cause
- fault identification
- probabilistic logic
- diagnostic tests
- fault isolation
- statistical relational learning
- neural network
- uncertain data
- parallel algorithm
- distributed systems
- message passing interface
- clinically relevant
- probabilistic modeling
- bayesian model
- dynamic systems
- test cases
- generative model
- repair actions