Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path.
András AntosCsaba SzepesváriRémi MunosPublished in: Mach. Learn. (2008)
Keyphrases
- sample path
- policy iteration
- bellman residual
- approximation methods
- asymptotic analysis
- markov decision processes
- optimal policy
- reinforcement learning
- markov chain
- fixed point
- model free
- least squares
- markov decision process
- markov decision problems
- average reward
- temporal difference
- learning algorithm
- lost sales
- state space
- finite state
- large deviations
- stationary points
- dynamic programming
- supervised learning
- infinite horizon
- optimal control
- learning tasks
- fluid model
- cost function