Posterior sampling for reinforcement learning: worst-case regret bounds.

Shipra Agrawal Randy Jia

Published in: CoRR (2017)

Keyphrases

reinforcement learning
multi armed bandit
worst case
regret bounds
lower bound
upper bound
markov chain monte carlo
sample size
state space
learning algorithm
model free
probability distribution
optimal policy
learning process
markov decision processes
probabilistic model
bayesian framework
multi class
machine learning
random sampling
learning problems
posterior probability
monte carlo
posterior distribution
np hard
computational complexity
optimal solution
generative model