Login / Signup
Improved bound on the worst case complexity of Policy Iteration.
Romain Hollanders
Balázs Gerencsér
Jean-Charles Delvenne
Raphaël M. Jungers
Published in:
Oper. Res. Lett. (2016)
Keyphrases
</>
policy iteration
markov decision processes
fixed point
optimal policy
reinforcement learning
least squares
model free
finite state
upper bound
sample path
linear programming
lower bound
temporal difference
utility function
pairwise
infinite horizon
image sequences
average reward