Optimistic Posterior Sampling for Reinforcement Learning: Worst-Case Regret Bounds.
Shipra AgrawalRandy JiaPublished in: Math. Oper. Res. (2023)
Keyphrases
- reinforcement learning
- multi armed bandit
- worst case
- regret bounds
- lower bound
- upper bound
- state space
- markov chain monte carlo
- markov decision processes
- sample size
- probability distribution
- random sampling
- optimal policy
- learning algorithm
- machine learning
- bayesian framework
- posterior probability
- learning problems
- probabilistic model
- online learning
- posterior distribution
- gaussian process
- sampling algorithm
- support vector