Login / Signup
Regret Analysis of a Markov Policy Gradient Algorithm for Multiarm Bandits.
Neil Walton
Denis Denisov
Published in:
Math. Oper. Res. (2023)
Keyphrases
</>
learning algorithm
optimal solution
computational complexity
policy gradient
cost function
worst case
neural network
machine learning
np hard
dynamic programming
sufficient conditions
convergence rate
natural gradient