Average-Reward Off-Policy Policy Evaluation with Function Approximation.

Shangtong Zhang Yi Wan Richard S. Sutton Shimon Whiteson

Published in: ICML (2021)

Keyphrases

average reward
function approximation
policy evaluation
model free
reinforcement learning
policy iteration
temporal difference
markov decision processes
optimal policy
reinforcement learning algorithms
td learning
radial basis function
learning tasks
state space
function approximators
state action
long run
multi agent
reinforcement learning methods
active learning
learning algorithm
machine learning
neural network
real valued