An algorithmic approach to error localization and partial recomputation for low-overhead fault tolerance.
Joseph SloanRakesh KumarGreg BronevetskyPublished in: DSN (2013)
Keyphrases
- fault tolerance
- low overhead
- load balancing
- error detection
- fault tolerant
- distributed systems
- distributed computing
- high reliability
- mobile agents
- peer to peer
- group communication
- replicated databases
- data replication
- database replication
- response time
- fault management
- single point of failure
- energy efficient
- shared memory
- artificial intelligence
- error correction
- wireless sensor networks
- multi agent systems
- metadata