Login / Signup
Finite-Sample Analysis of Multi-Agent Policy Evaluation with Kernelized Gradient Temporal Difference.
Paulo Heredia
Shaoshuai Mou
Published in:
CDC (2020)
Keyphrases
</>
temporal difference
policy evaluation
reinforcement learning
td learning
multi agent
monte carlo
evaluation function
function approximation
model free
least squares
reinforcement learning algorithms
supervised learning
action selection
pairwise
step size
policy iteration
state space