Cluster fault-tolerance: An experimental evaluation of checkpointing and MapReduce through simulation.
Thomas C. BressoudMichael KozuchPublished in: CLUSTER (2009)
Keyphrases
- fault tolerance
- experimental evaluation
- distributed computing
- fault tolerant
- distributed systems
- load balancing
- response time
- failure recovery
- high availability
- peer to peer
- database replication
- simulation model
- replicated databases
- fault management
- grid computing
- mobile agents
- group communication
- cloud computing
- clustering algorithm
- single point of failure
- reinforcement learning
- high performance computing
- error detection
- data sets
- component failures
- mapreduce framework
- end to end
- multi agent systems
- knowledge base
- artificial intelligence