Why is Posterior Sampling Better than Optimism for Reinforcement Learning?
Ian OsbandBenjamin Van RoyPublished in: ICML (2017)
Keyphrases
- reinforcement learning
- markov chain monte carlo
- metropolis hastings
- probability distribution
- posterior distribution
- random sampling
- state space
- sampling strategy
- reinforcement learning algorithms
- function approximation
- learning algorithm
- sampling algorithm
- model free
- machine learning
- multi agent
- monte carlo
- probabilistic model
- optimal policy
- robotic control
- sampling strategies
- learning problems
- learning classifier systems
- gaussian process
- sample size
- markov decision processes
- posterior probability
- bayesian framework
- parameter space
- markov chain
- multi agent systems
- sampling rate
- importance sampling