OCFTL: An MPI Implementation-Independent Fault Tolerance Library for Task-Based Applications.
Pedro Henrique Di Francia RossoEmilio FrancesquiniPublished in: CARLA (2021)
Keyphrases
- fault tolerance
- fault tolerant
- high performance computing
- distributed systems
- load balancing
- response time
- high availability
- distributed computing
- peer to peer
- replicated databases
- message passing
- parallel implementation
- database replication
- group communication
- failure recovery
- sensor networks
- error detection
- data replication
- parallel computing
- message passing interface
- fault management
- single point of failure