Optimal policy evaluation using kernel-based temporal difference methods.
Yaqi DuanMengdi WangMartin J. WainwrightPublished in: CoRR (2021)
Keyphrases
- temporal difference methods
- policy evaluation
- temporal difference
- least squares
- reinforcement learning
- model free
- monte carlo
- policy iteration
- markov decision processes
- function approximation
- td learning
- evaluation function
- dynamic programming
- genetic programming
- optimal control
- partially observable markov decision processes
- semi parametric
- optimal solution
- neural network
- variance reduction