Multilevel Checkpoint/Restart for Large Computational Jobs on Distributed Computing Resources.
Masoud Gholami EstahbanatiFlorian SchintkePublished in: SRDS (2019)
Keyphrases
- computing resources
- geographically distributed
- load balance
- cloud computing
- computational grids
- limited resources
- grid computing
- distributed systems
- fault tolerant
- fault tolerance
- resource management
- virtual machine
- distributed environment
- distributed computing
- random walk
- multi core processors
- network bandwidth
- high performance computing
- network resources
- cooperative
- multi agent
- resource manager
- processing times
- real world
- computing environments
- computer networks
- information systems
- grid environment
- data intensive
- flowshop