Posterior sampling for reinforcement learning: worst-case regret bounds.
Shipra AgrawalRandy JiaPublished in: CoRR (2017)
Keyphrases
- reinforcement learning
- multi armed bandit
- worst case
- regret bounds
- lower bound
- upper bound
- markov chain monte carlo
- sample size
- state space
- learning algorithm
- model free
- probability distribution
- optimal policy
- learning process
- markov decision processes
- probabilistic model
- bayesian framework
- multi class
- machine learning
- random sampling
- learning problems
- posterior probability
- monte carlo
- posterior distribution
- np hard
- computational complexity
- optimal solution
- generative model