On the Use of Cluster-Based Partial Message Logging to Improve Fault Tolerance for MPI HPC Applications.
Thomas RoparsAmina GuermoucheBora UçarEsteban MenesesLaxmikant V. KaléFranck CappelloPublished in: Euro-Par (1) (2011)
Keyphrases
- fault tolerance
- high performance computing
- fault tolerant
- response time
- distributed computing
- distributed systems
- load balancing
- mobile agents
- database replication
- high availability
- failure recovery
- peer to peer
- error detection
- replicated databases
- fault management
- group communication
- wireless sensor
- node failures
- component failures
- shared memory
- artificial intelligence
- grid computing
- data streams
- database systems