Classical Policy Gradient: Preserving Bellman's Principle of Optimality.

Philip S. Thomas Scott M. Jordan Yash Chandak Chris Nota James Kostas

Published in: CoRR (2019)

Keyphrases

policy gradient
actor critic
average reward
reinforcement learning
state action
optimal control
model free reinforcement learning
machine learning
optimal solution
search space
optimal policy
function approximation
approximation methods
gradient method