Sign in

Softmax policy gradient methods can take exponential time to converge.

Gen LiYuting WeiYuejie ChiYuxin Chen
Published in: Math. Program. (2023)
Keyphrases
  • policy gradient methods
  • natural actor critic
  • robot arm
  • policy gradient
  • temporal difference learning
  • model checking