Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration.
Zeyang LiChuxiong HuYunan WangGuojian ZhanJie LiShengbo Eben LiPublished in: CoRR (2023)
Keyphrases
- policy iteration
- least squares
- markov decision processes
- model free
- reinforcement learning
- fixed point
- sample path
- optimal policy
- policy evaluation
- average reward
- markov decision process
- finite state
- temporal difference
- optimal control
- convergence rate
- state space
- machine learning
- objective function
- dynamic programming
- infinite horizon
- semi parametric
- markov decision problems
- function approximation
- evaluation function