Thompson Sampling for Learning Parameterized Markov Decision Processes.
Aditya GopalanShie MannorPublished in: COLT (2015)
Keyphrases
- markov decision processes
- reinforcement learning
- learning algorithm
- model based reinforcement learning
- transition matrices
- real time dynamic programming
- stochastic games
- partially observable
- state space
- state abstraction
- reachability analysis
- dynamic programming
- search algorithm
- finite state
- average cost
- risk sensitive
- decision theoretic planning
- multi agent