SparseFT: Sparsity-aware Fault Tolerance for Reliable CNN Inference on GPUs.
Gwangeun ByeonSeungtae LeeSeongwook KimYongjun KimPrashant J. NairSeokin HongPublished in: PACT (2023)
Keyphrases
- fault tolerance
- fault tolerant
- distributed systems
- load balancing
- high availability
- distributed computing
- peer to peer
- response time
- group communication
- replicated databases
- cellular neural networks
- mobile agents
- database replication
- high scalability
- general purpose
- failure recovery
- sparse representation
- data replication
- fault management
- component failures
- database
- replica control
- wireless sensor networks
- multi agent
- reinforcement learning
- metadata
- databases