Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees.
Daniil TiapkinDenis BelomestnyDaniele CalandrielloEric MoulinesRémi MunosAlexey NaumovMark RowlandMichal ValkoPierre MénardPublished in: NeurIPS (2022)
Keyphrases
- reinforcement learning
- sampling methods
- markov chain monte carlo
- lower bound
- sampled data
- random sample
- stratified sampling
- metropolis hastings
- sparsely sampled
- random sampling
- upper bound
- state space
- model free
- random samples
- monte carlo
- sampling strategy
- temporal difference
- sample size
- sample selection
- function approximation
- dynamic programming
- data sets
- markov decision processes
- sample space
- probability distribution
- sampling algorithm
- learning algorithm
- probabilistic model
- function approximators
- sample points
- action selection
- class imbalance
- posterior distribution
- optimal control
- optimal policy
- worst case
- output space
- multi class
- objective function