A reinterpretation of the policy oscillation phenomenon in approximate policy iteration.
Paul WagnerPublished in: NIPS (2011)
Keyphrases
- approximate policy iteration
- reinforcement learning
- policy iteration
- markov decision problems
- policy search
- markov games
- markov decision processes
- temporal difference
- optimal policy
- reinforcement learning algorithms
- markov decision process
- multiagent reinforcement learning
- fixed point
- model free
- linear programming
- function approximators
- function approximation
- state space
- infinite horizon
- control problems
- decision theoretic
- dynamic programming
- transition probabilities
- continuous state
- optimal control
- least squares
- learning algorithm
- reward function
- neural network
- partially observable
- finite horizon
- evaluation function
- average reward
- utility function
- sufficient conditions