Improved and Generalized Upper Bounds on the Complexity of Policy Iteration.

Published in: CoRR (2013)

Keyphrases

upper bound
policy iteration
markov decision processes
worst case
lower bound
model free
reinforcement learning
optimal policy
decision problems
fixed point
finite state
infinite horizon
temporal difference
sample path
neural network
least squares
markov chain
cost function
convergence rate
objective function
average reward