Login / Signup
Optimizing the fault-tolerance overheads of HPC systems using prediction and multiple proactive actions.
Lei Zhu
Jianhua Gu
Yunlan Wang
Tianhai Zhao
Zhennao Cai
Published in:
J. Supercomput. (2015)
Keyphrases
</>
fault tolerance
distributed systems
fault tolerant
response time
fault management
single point of failure
distributed computing
high scalability
load balancing
high performance computing
peer to peer
replicated databases
high availability
group communication
mobile agents
database replication
computing systems
error detection
databases
artificial intelligence
scientific computing
metadata
data sets
computing environments
wireless sensor
distributed environment
computer systems
expert systems