Rethinking Software Fault Tolerance.
Kishor S. TrivediMichael GrottkeJavier Alonso LopezPublished in: IEEE Trans. Reliab. (2024)
Keyphrases
- fault tolerance
- fault tolerant
- load balancing
- distributed systems
- high availability
- distributed computing
- group communication
- response time
- peer to peer
- mobile agents
- replicated databases
- database replication
- fault management
- high performance computing
- software development
- data replication
- software systems
- single point of failure
- error detection
- reinforcement learning
- high scalability
- wireless sensor
- component failures