Login / Signup
Scalable group-based checkpoint/restart for large-scale message-passing systems.
Justin C. Y. Ho
Cho-Li Wang
Francis C. M. Lau
Published in:
IPDPS (2008)
Keyphrases
</>
message passing
distributed systems
distributed shared memory
belief propagation
shared memory
web scale
probabilistic inference
markov random field
factor graphs
linear programming
maximum likelihood
matching algorithm
fault tolerance
sum product algorithm