Arithmetic-intensity-guided fault tolerance for neural network inference on GPUs.
Jack KosaianK. V. RashmiPublished in: SC (2021)
Keyphrases
- fault tolerance
- neural network
- fault tolerant
- distributed systems
- distributed computing
- high availability
- load balancing
- response time
- replicated databases
- fault management
- group communication
- general purpose
- high scalability
- failure recovery
- peer to peer
- database replication
- node failures
- multimedia
- data replication
- error detection
- mobile agents
- fuzzy logic
- expert systems
- data streams
- single point of failure