Bayesian bandits: balancing the exploration-exploitation tradeoff via double sampling.

Iñigo Urteaga Chris H. Wiggins

Published in: CoRR (2017)

Keyphrases

exploration exploitation tradeoff
multi armed bandit
markov chain monte carlo
reinforcement learning
random sampling
objective function
bayesian inference
data sets
sample size
posterior probability
monte carlo
relevance feedback
bayesian networks
markov chain
function approximation
genetic algorithm
machine learning
neural network