Average-Reward Off-Policy Policy Evaluation with Function Approximation.

Shangtong Zhang Yi Wan Richard S. Sutton Shimon Whiteson

Published in: CoRR (2021)

Keyphrases

function approximation
average reward
policy evaluation
model free
reinforcement learning
policy iteration
temporal difference
td learning
markov decision processes
optimal policy
reinforcement learning algorithms
learning tasks
long run
policy gradient
state space
function approximators
dynamic programming
learning algorithm
machine learning
radial basis function
partially observable
transfer learning