Coordinated checkpoint versus message log for fault tolerant MPI.
Pierre LemarinierAurélien BouteillerGéraud KrawezikFranck CappelloPublished in: Int. J. High Perform. Comput. Netw. (2004)
Keyphrases
- fault tolerant
- fault tolerance
- high performance computing
- message exchange
- distributed systems
- parallel algorithm
- high availability
- message passing
- cooperative
- message passing interface
- load balancing
- shared memory
- multi agent
- parallel implementation
- massively parallel
- state machine
- parallelization strategy
- interconnection networks
- safety critical
- parallel computing