Login / Signup
Softmax policy gradient methods can take exponential time to converge.
Gen Li
Yuting Wei
Yuejie Chi
Yuxin Chen
Published in:
Math. Program. (2023)
Keyphrases
</>
policy gradient methods
natural actor critic
robot arm
policy gradient
temporal difference learning
model checking