The Reinforce Policy Gradient Algorithm Revisited.

Shalabh Bhatnagar

Published in: CoRR (2023)

Keyphrases

dynamic programming
search space
np hard
policy gradient
learning algorithm
objective function
k means
cost function
gradient ascent
neural network
computational complexity
optimal control