Login / Signup
On effectiveness of the Mirror Decent Algorithm for a stochastic multi-armed bandit governed by a stationary finite Markov chain.
Alexander V. Nazin
Boris M. Miller
Published in:
AuCC (2013)
Keyphrases
</>
markov chain
monte carlo
dynamic programming
multi armed bandit
learning algorithm
markov model
k means
random walk
optimal solution
worst case
finite state
machine learning
search space
markov decision processes
monte carlo method