Evaluating User-Level Fault Tolerance for MPI Applications.
Ignacio LagunaDavid F. RichardsTodd GamblinMartin SchulzBronis R. de SupinskiPublished in: EuroMPI/ASIA (2014)
Keyphrases
- fault tolerance
- fault tolerant
- high performance computing
- distributed systems
- load balancing
- response time
- high availability
- group communication
- peer to peer
- distributed computing
- database replication
- replicated databases
- databases
- high scalability
- parallel implementation
- message passing
- fault management
- expert systems