Application-Level Resilience Modeling for HPC Fault Tolerance.
Luanzheng GuoHanlin HeDong LiPublished in: CoRR (2017)
Keyphrases
- fault tolerance
- application level
- fault tolerant
- distributed systems
- load balancing
- peer to peer
- response time
- group communication
- high performance computing
- operating system
- distributed computing
- fault management
- database replication
- mobile agents
- replicated databases
- bottle neck
- single point of failure
- overlay network
- network management
- quality of service
- data replication
- error detection
- virtual machine
- data sources
- data streams
- cooperative
- component failures
- real time