A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems
Michael TreasterPublished in: CoRR (2005)
Keyphrases
- fault tolerance
- fault management
- fault tolerant
- distributed systems
- failure recovery
- single point of failure
- load balancing
- response time
- error detection
- distributed computing
- network management
- high availability
- knowledge based systems
- replicated databases
- normal operation
- mobile agents
- group communication
- intelligent systems
- peer to peer
- database replication
- data sets
- fault detection
- databases