Using checkpointing for fault tolerance and parallel program debugging.
Nam ThoaiDieter KranzlmüllerJens VolkertPublished in: Parallel and Distributed Computing and Networks (2004)
Keyphrases
- fault tolerance
- fault tolerant
- distributed systems
- load balancing
- failure recovery
- distributed computing
- response time
- high availability
- peer to peer
- replicated databases
- group communication
- mobile agents
- database replication
- fault management
- shared memory
- high performance computing
- data replication
- high scalability
- single point of failure
- parallel computing
- massively parallel
- error detection
- mobile agent system
- database
- distributed databases
- knowledge acquisition
- databases