Online Learning in Weakly Coupled Markov Decision Processes: A Convergence Time Study.
Xiaohan WeiHao YuMichael J. NeelyPublished in: Proc. ACM Meas. Anal. Comput. Syst. (2018)
Keyphrases
- markov decision processes
- online learning
- state space
- optimal policy
- stochastic shortest path
- planning under uncertainty
- policy iteration
- dynamic programming
- finite state
- partially observable
- reinforcement learning
- reachability analysis
- least squares
- multistage
- average cost
- objective function
- finite horizon
- average reward
- machine learning