Improved bound on the worst case complexity of Policy Iteration.

Romain Hollanders Balázs Gerencsér Jean-Charles Delvenne Raphaël M. Jungers

Published in: CoRR (2014)

Keyphrases

policy iteration
markov decision processes
model free
sample path
fixed point
reinforcement learning
least squares
temporal difference
upper bound
optimal policy
convergence rate
markov decision process
finite state
average reward
infinite horizon
policy evaluation
markov chain
evaluation function
model checking
monte carlo