Learning Unknown Markov Decision Processes: A Thompson Sampling Approach.
Yi OuyangMukul GagraniAshutosh NayyarRahul JainPublished in: CoRR (2017)
Keyphrases
- finite horizon
- markov decision processes
- optimal policy
- reinforcement learning
- model based reinforcement learning
- markov decision process
- decision theoretic planning
- stochastic games
- partially observable
- state space
- finite state
- dynamic programming
- decision problems
- learning algorithm
- state abstraction
- transition matrices
- real time dynamic programming
- multi agent
- supervised learning
- planning under uncertainty