Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols.
Darius BuntinasCamille CotiThomas HéraultPierre LemarinierLaurence PilardAla RezmeritaEric RodriguezFranck CappelloPublished in: Future Gener. Comput. Syst. (2008)
Keyphrases
- fault tolerant
- fault tolerance
- distributed systems
- key distribution
- load balancing
- record linkage
- high performance computing
- high availability
- state machine
- distributed databases
- data replication
- message passing
- safety critical
- multi agent
- mobile agents
- parallel implementation
- failure recovery
- low overhead
- response time