EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications.
Sourav ChakrabortyIgnacio LagunaMurali EmaniKathryn MohrorDhabaleswar K. PandaMartin SchulzHari SubramoniPublished in: Concurr. Comput. Pract. Exp. (2020)
Keyphrases
- fault tolerance
- fault tolerant
- high performance computing
- high scalability
- response time
- load balancing
- distributed systems
- replicated databases
- high availability
- distributed computing
- peer to peer
- group communication
- failure recovery
- artificial intelligence
- mobile agents
- fault management
- parallel computing
- sensor nodes
- parallel algorithm
- wireless sensor