Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies
Boris LesnerBruno ScherrerPublished in: CoRR (2013)
Keyphrases
- non stationary
- policy iteration
- approximate policy iteration
- optimal policy
- discounted reward
- markov decision processes
- lower bound
- approximate value iteration
- policy evaluation
- upper bound
- markov decision process
- reinforcement learning
- worst case
- fixed point
- factored mdps
- markov decision problems
- model free
- least squares
- sample path
- temporal difference
- average reward
- infinite horizon
- state space
- finite state
- markov games
- optimal control
- average cost
- long run
- linear programming
- dynamic programming
- reinforcement learning algorithms
- convergence rate
- np hard
- function approximation
- empirical mode decomposition
- sufficient conditions
- variance reduction
- computer vision
- video sequences
- multiresolution
- wavelet transform
- dynamical systems
- finite horizon
- data mining