A New Approach to System-Level Fault-Tolerance in Message-Passing MultiComputers.
Guy W. ZimmermanAbdol-Hossein EsfahanianPublished in: Great Lakes Computer Science Conference (1989)
Keyphrases
- message passing
- fault tolerance
- distributed systems
- fault tolerant
- belief propagation
- distributed computing
- load balancing
- shared memory
- probabilistic inference
- response time
- database replication
- approximate inference
- replicated databases
- peer to peer
- distributed shared memory
- factor graphs
- probabilistic model
- sum product
- inference in graphical models
- sum product algorithm
- fault management
- error detection
- distributed memory
- markov random field
- failure recovery
- component failures
- high performance computing
- graphical models