Learning Unknown Markov Decision Processes: A Thompson Sampling Approach.
Yi OuyangMukul GagraniAshutosh NayyarRahul JainPublished in: NIPS (2017)
Keyphrases
- markov decision processes
- reinforcement learning
- state space
- learning algorithm
- transition matrices
- model based reinforcement learning
- partially observable
- optimal policy
- real time dynamic programming
- finite state
- learning tasks
- dynamic programming
- policy iteration
- finite horizon
- planning under uncertainty
- factored mdps
- supervised learning