Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization.

Yuhao Ding Junzi Zhang Javad Lavaei

Published in: CoRR (2021)

Keyphrases

policy gradient methods
natural actor critic
monte carlo
convergence rate
convergence speed
robot arm
neural network
machine learning
dynamic programming
least squares