Check-Pointing Approach for Fault Tolerance in OpenSHMEM.
Pengfei HaoSwaroop PophalePavel ShamisTony CurtisBarbara M. ChapmanPublished in: OpenSHMEM (2015)
Keyphrases
- fault tolerance
- fault tolerant
- error detection
- distributed systems
- load balancing
- response time
- distributed computing
- replicated databases
- group communication
- high availability
- peer to peer
- database replication
- fault management
- distributed query processing
- single point of failure
- high performance computing
- mobile agents
- high scalability
- distributed environment
- failure recovery