Optimistic posterior sampling for reinforcement learning: worst-case regret bounds.

Shipra Agrawal Randy Jia

Published in: NIPS (2017)

Keyphrases

reinforcement learning
multi armed bandit
worst case
regret bounds
lower bound
upper bound
sample size
state space
markov chain monte carlo
probability distribution
learning algorithm
optimal policy
model free
random sampling
learning process
probabilistic model
posterior distribution
np hard
model selection
generative model
online learning
markov decision processes
bayesian framework
machine learning