Mirage: Towards Low-interruption Services on Batch GPU Clusters with Reinforcement Learning.
Qiyang DingPengfei ZhengShreyas KudariShivaram VenkataramanZhao ZhangPublished in: SC (2023)
Keyphrases
- reinforcement learning
- service oriented
- web services
- clustering algorithm
- real time
- ubiquitous computing
- context aware
- function approximation
- information services
- state space
- service providers
- hierarchical clustering
- fuzzy c means
- service composition
- optimal policy
- service discovery
- graphics processors
- batch mode
- model free
- parallel computing
- highly correlated
- parallel implementation
- cluster analysis
- service quality
- self organizing maps
- multi agent
- dynamic programming
- end users
- computing environments
- heterogeneous networks
- control policy
- markov decision processes