PILOT: An $\mathcal{O}(1/K)$-Convergent Approach for Policy Evaluation with Nonlinear Function Approximation.
Zhuqing LiuXin ZhangJia LiuZhengyuan ZhuSongtao LuPublished in: ICLR (2024)
Keyphrases
- function approximation
- policy evaluation
- temporal difference
- reinforcement learning
- model free
- td learning
- least squares
- function approximators
- radial basis function
- learning tasks
- monte carlo
- policy iteration
- neural network
- machine learning
- evaluation function
- markov decision processes
- optimal policy
- semi supervised learning
- learning algorithm