Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers.
Roberto GioiosaJosé Carlos SanchoSong JiangFabrizio PetriniPublished in: SC (2005)
Keyphrases
- fault tolerance
- fault tolerant
- parallel computers
- load balancing
- distributed systems
- failure recovery
- distributed computing
- peer to peer
- response time
- database replication
- parallel computing
- mobile agents
- single point of failure
- distributed memory
- high performance computing
- parallel implementation
- parallel processing
- parallel algorithm
- numerical methods
- shared memory
- cloud computing
- database systems
- artificial intelligence