A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance.
Chao WangFrank MuellerChristian EngelmannStephen L. ScottPublished in: IPDPS (2007)
Keyphrases
- fault tolerance
- fault tolerant
- high performance computing
- failure recovery
- distributed systems
- high availability
- response time
- peer to peer
- distributed computing
- load balancing
- replicated databases
- group communication
- mobile agents
- database replication
- parallel algorithm
- data sets
- service providers
- fault management
- semantic web services
- third party
- error detection
- data replication
- web services
- metadata