Enhancing Fault-Tolerance of Large-Scale MPI Scientific Applications.
Gabriel RodríguezPatricia GonzálezMaría J. MartínJuan TouriñoPublished in: PaCT (2007)
Keyphrases
- fault tolerance
- fault tolerant
- high performance computing
- high scalability
- distributed systems
- high availability
- distributed computing
- load balancing
- peer to peer
- response time
- mobile agents
- group communication
- database replication
- data intensive
- fault management
- parallel algorithm
- message passing
- database
- parallel implementation
- single point of failure
- component failures
- replicated databases
- error detection
- parallel computing
- grid computing
- mobile agent system
- failure recovery
- massively parallel