Block-Checksum-Based Fault Tolerance for Matrix Multiplication on Large-Scale Parallel Systems.
Yanchao ZhuYi LiuMingzhen LiDepei QianPublished in: HPCC/SmartCity/DSS (2018)
Keyphrases
- fault tolerance
- fault tolerant
- distributed systems
- high scalability
- fault management
- single point of failure
- matrix multiplication
- load balancing
- high availability
- distributed memory
- response time
- distributed computing
- intelligent systems
- database replication
- failure recovery
- replicated databases
- group communication
- parallel implementation
- data replication
- error detection
- high performance computing
- multimedia
- computing systems
- mobile agents
- knowledge based systems
- computer systems
- multi agent systems