Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs.
Shixun WuYujia ZhaiJinyang LiuJiajun HuangZizhe JianBryan M. WongZizhong ChenPublished in: CoRR (2023)
Keyphrases
- fault tolerance
- fault tolerant
- load balancing
- distributed systems
- high availability
- distributed computing
- response time
- peer to peer
- real time
- replicated databases
- mobile agents
- group communication
- high scalability
- failure recovery
- high performance computing
- database replication
- error detection
- fault management
- single point of failure
- scientific computing
- graphics processing units
- medical images
- database systems
- digital libraries
- databases
- data sets