Login / Signup
Lambda-Policy Iteration with Randomization for Contractive Models with Infinite Policies: Well-Posedness and Convergence (Extended Version).
Yuchao Li
Karl Henrik Johansson
Jonas Mårtensson
Published in:
CoRR (2019)
Keyphrases
</>
fixed point
policy iteration
optimal policy
markov decision processes
reinforcement learning
least squares
finite state
infinite horizon
dynamic programming
stochastic approximation
linear programming
mathematical model
temporal difference
markov decision process
sample path