workshop on fault-tolerance for HPC at extreme scale FTXS 2010.
John T. DalyNathan DeBardelebenPublished in: DSN (2010)
Keyphrases
- fault tolerance
- fault tolerant
- load balancing
- distributed systems
- response time
- high performance computing
- high availability
- distributed computing
- database replication
- peer to peer
- group communication
- mobile agents
- error detection
- replicated databases
- sensor nodes
- fault management
- node failures
- single point of failure