Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?
Nicolas GastBruno GaujalKimang KhunPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- multi armed bandit
- markov chain monte carlo
- metropolis hastings
- function approximation
- multi agent
- posterior distribution
- point processes
- multi armed bandits
- markov decision processes
- probability distribution
- sampling algorithm
- posterior probability
- optimal policy
- monte carlo
- random sampling
- probabilistic model
- model free
- highly scalable
- proposal distribution
- particle filter
- temporal difference
- sampling methods
- objective function
- learning process
- sampled data
- reinforcement learning algorithms
- maximum likelihood
- learning classifier systems
- optimal control