A Multi-Armed Bandit Approach for Online Expert Selection in Markov Decision Processes.
Eric MazumdarRoy DongVicenç Rúbies RoyoClaire J. TomlinS. Shankar SastryPublished in: CoRR (2017)
Keyphrases
- markov decision processes
- multi armed bandit
- reinforcement learning
- finite state
- optimal policy
- state space
- reachability analysis
- policy iteration
- transition matrices
- dynamic programming
- decision theoretic planning
- online learning
- planning under uncertainty
- model based reinforcement learning
- average reward
- partially observable
- infinite horizon
- action sets
- multi armed bandits
- average cost
- action space
- active learning
- objective function