Fault-Aware Group-Collective Communication Creation and Repair in MPI.
Roberto RoccoGianluca PalermoPublished in: Euro-Par (2023)
Keyphrases
- parallel algorithm
- fault diagnosis
- message passing
- fault detection
- fault isolation
- communication networks
- parallel implementation
- general purpose
- communication systems
- group members
- group formation
- massively parallel
- creation process
- communication patterns
- multithreading
- shared memory
- repair actions
- distributed memory
- neural network
- information exchange
- communication cost
- computer networks
- multi agent systems