Epidemic Fault Tolerance for Extreme-Scale Parallel Computing.
Amogh KattiGiuseppe Di FattaPublished in: IDCS (2015)
Keyphrases
- fault tolerance
- parallel computing
- fault tolerant
- high performance computing
- load balancing
- distributed systems
- distributed computing
- response time
- peer to peer
- computing systems
- massively parallel
- mobile agents
- parallel computers
- database replication
- fault management
- parallel machines
- shared memory
- group communication
- parallel programming
- parallel execution
- replicated databases
- failure recovery
- single point of failure
- pairwise
- graphics processing units
- fine grained
- artificial intelligence