Login / Signup
Robust Mirror Decent Algorithm for a Multi-Armed Bandit Governed by a Stationary Finite Markov Chain.
Alexander V. Nazin
Boris M. Miller
Published in:
MIM (2013)
Keyphrases
</>
markov chain
monte carlo
learning algorithm
optimal solution
markov model
monte carlo simulation
finite state
dynamic programming
multi armed bandit
worst case
probabilistic model
k means
parameter estimation
steady state
maximum entropy
monte carlo method
machine learning