Mirror decent algorithm for a multi-armed bandit governed by a stationary finite state Markov chain.
Alexander V. NazinBoris M. MillerPublished in: ECC (2013)
Keyphrases
- markov chain
- finite state
- monte carlo
- markov model
- monte carlo simulation
- algo rithm
- monte carlo method
- transition probabilities
- dynamic programming
- k means
- objective function
- state space
- expectation maximization
- random walk
- transition matrix
- stationary distribution
- optimal solution
- learning algorithm
- probabilistic model
- reinforcement learning
- partially observable markov decision processes
- average cost
- lower bound
- search space
- linear regression
- model checking