Sign in

Policy Evaluation in Continuous MDPs With Efficient Kernelized Gradient Temporal Difference.

Alec KoppelGarrett WarnellEthan StumpPeter StoneAlejandro Ribeiro
Published in: IEEE Trans. Autom. Control. (2021)
Keyphrases
  • semi parametric
  • policy evaluation
  • td learning
  • least squares
  • statistical inference
  • temporal difference
  • reinforcement learning