Login / Signup
Policy Evaluation in Continuous MDPs With Efficient Kernelized Gradient Temporal Difference.
Alec Koppel
Garrett Warnell
Ethan Stump
Peter Stone
Alejandro Ribeiro
Published in:
IEEE Trans. Autom. Control. (2021)
Keyphrases
</>
semi parametric
policy evaluation
td learning
least squares
statistical inference
temporal difference
reinforcement learning