Login / Signup
Regret Analysis of a Markov Policy Gradient Algorithm for Multi-arm Bandits.
Denis Denisov
Neil Walton
Published in:
CoRR (2020)
Keyphrases
</>
objective function
worst case
learning algorithm
computational complexity
search space
markov chain
monte carlo
policy gradient
cost function
optimization method
optimal solution
dynamic programming
simulated annealing
convergence rate
negative matrix factorization