Coordinated Checkpoint versus Message Log for Fault Tolerant MPI.
Aurélien BouteillerPierre LemarinierGéraud KrawezikFranck CappelloPublished in: CLUSTER (2003)
Keyphrases
- fault tolerant
- fault tolerance
- high performance computing
- message exchange
- distributed systems
- message passing
- multi agent
- parallel implementation
- shared memory
- parallel algorithm
- message passing interface
- cooperative
- load balancing
- high availability
- safety critical
- interconnection networks
- state machine
- parallel computing
- error detection
- data replication