FT-BLAS: A High Performance BLAS Implementation With Online Fault Tolerance.
Yujia ZhaiElisabeth GiemQuan FanKai ZhaoJinyang LiuZizhong ChenPublished in: CoRR (2021)
Keyphrases
- fault tolerance
- highly optimized
- fault tolerant
- scientific computing
- high performance computing
- load balancing
- distributed systems
- linear algebra
- response time
- distributed computing
- peer to peer
- high availability
- general purpose
- group communication
- failure recovery
- fault management
- error detection
- database replication
- replicated databases
- single point of failure
- component failures
- data replication
- end to end
- digital libraries
- database