Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees.

Sharan Vaswani Amirreza Kazemi Reza Babanezhad Nicolas Le Roux

Published in: CoRR (2023)

Keyphrases

function approximation
theoretical guarantees
actor critic
policy iteration
temporal difference
reinforcement learning
model free
policy gradient
markov decision processes
temporal difference learning
worst case
reinforcement learning algorithms
decision making
learning tasks
approximate dynamic programming
radial basis function
optimal policy
decision problems
gradient method
optimal control
least squares
upper bound
function approximators
artificial neural networks
average reward