Scalable, fault tolerant membership for MPI tasks on HPC systems.
Jyothish VarmaChao WangFrank MuellerChristian EngelmannStephen L. ScottPublished in: ICS (2006)
Keyphrases
- fault tolerant
- fault tolerance
- distributed systems
- safety critical
- high performance computing
- high availability
- high assurance
- state machine
- distributed computing
- message passing
- scientific computing
- expert systems
- load balancing
- management system
- computing systems
- computing environments
- complex systems
- mobile agents
- peer to peer
- message passing interface