Deep Reinforcement Agent for Failure-aware Job scheduling in High-Performance Computing.
Kang YangRongyu CaoYueyuan ZhouJiawei ZhangEn ShaoGuangming TanPublished in: ICPADS (2021)
Keyphrases
- high performance computing
- job scheduling
- scientific computing
- computational science
- grid environment
- grid computing
- massively parallel
- multi agent
- multi agent systems
- reinforcement learning
- computing systems
- parallel computing
- multiagent systems
- software agents
- intelligent agents
- energy efficiency
- resource allocation
- fault tolerance
- mobile agents
- load balancing
- identical machines
- computing environments
- computing resources
- fine grained
- power consumption
- resource management
- processing times