Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs.
Martin SchulzGreg BronevetskyRohit FernandesDaniel MarquesKeshav PingaliPaul StodghillPublished in: SC (2004)
Keyphrases
- application level
- operating system
- network management
- parallel implementation
- quality of service
- network services
- overlay network
- virtual machine
- general purpose
- bottle neck
- parallel architecture
- parallel algorithm
- shared memory
- message passing
- fault tolerance
- computational intelligence
- parallel computing
- network resources
- distributed memory
- fault tolerant
- computer systems
- response time