Robust Mirror Decent Algorithm for a Multi-Armed Bandit Governed by a Stationary Finite Markov Chain.

Alexander V. Nazin Boris M. Miller

Published in: MIM (2013)

Keyphrases

markov chain
monte carlo
learning algorithm
optimal solution
markov model
monte carlo simulation
finite state
dynamic programming
multi armed bandit
worst case
probabilistic model
k means
parameter estimation
steady state
maximum entropy
monte carlo method
machine learning