MPI tools and performance studies - Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI.
Camille CotiThomas HéraultPierre LemarinierLaurence PilardAla RezmeritaEric RodriguezFranck CappelloPublished in: SC (2006)
Keyphrases
- fault tolerant
- fault tolerance
- high performance computing
- distributed systems
- message passing
- message passing interface
- record linkage
- parallel algorithm
- parallel implementation
- load balancing
- high availability
- parallel computing
- failure recovery
- shared memory
- parallelization strategy
- artificial intelligence
- multi agent
- massively parallel
- operating system
- peer to peer
- safety critical
- state machine
- response time
- data structure