Bayesian bandits: balancing the exploration-exploitation tradeoff via double sampling.
Iñigo UrteagaChris H. WigginsPublished in: CoRR (2017)
Keyphrases
- exploration exploitation tradeoff
- multi armed bandit
- markov chain monte carlo
- reinforcement learning
- random sampling
- objective function
- bayesian inference
- data sets
- sample size
- posterior probability
- monte carlo
- relevance feedback
- bayesian networks
- markov chain
- function approximation
- genetic algorithm
- machine learning
- neural network