Efficient Management and Intelligent Fault Tolerance for HPC Interconnect Networks.
Jijun CaoMingche LaiZhang LuoJiaqing XuZhengbin PangPublished in: ICPADS (2019)
Keyphrases
- fault tolerance
- fault tolerant
- fault management
- load balancing
- distributed systems
- response time
- high availability
- high performance computing
- peer to peer
- distributed computing
- group communication
- data replication
- mobile agents
- single point of failure
- database replication
- replicated databases
- failure recovery
- management system
- information systems
- intelligent systems
- data management