Login / Signup
Lambda-Policy Iteration with Randomization for Contractive Models with Infinite Policies: Well-Posedness and Convergence.
Yuchao Li
Karl Henrik Johansson
Jonas Mårtensson
Published in:
L4DC (2020)
Keyphrases
</>
fixed point
policy iteration
optimal policy
stochastic approximation
markov decision processes
transition matrices
reinforcement learning
probabilistic model
convergence rate
markov decision process
state space
average cost
sample path