Why is Posterior Sampling Better than Optimism for Reinforcement Learning.
Ian OsbandBenjamin Van RoyPublished in: CoRR (2016)
Keyphrases
- reinforcement learning
- markov chain monte carlo
- function approximation
- metropolis hastings
- sampling algorithm
- posterior distribution
- probabilistic model
- posterior probability
- state space
- model free
- supervised learning
- temporal difference learning
- sample size
- monte carlo
- machine learning
- probability distribution
- learning algorithm
- multi agent reinforcement learning
- sampling strategy
- markov decision process
- temporal difference
- markov decision processes
- optimal policy
- generative model
- action selection
- training data
- reinforcement learning algorithms
- multi agent
- importance sampling
- random sampling
- sampling strategies
- robotic control
- least squares