Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path.
András AntosCsaba SzepesváriRémi MunosPublished in: COLT (2006)
Keyphrases
- multistage
- policy iteration
- optimal policy
- sample path
- bellman residual
- reinforcement learning
- markov decision processes
- asymptotic analysis
- lost sales
- finite horizon
- markov decision process
- infinite horizon
- state space
- markov decision problems
- finite state
- fixed point
- markov chain
- least squares
- learning algorithm
- temporal difference
- monte carlo
- machine learning
- model free
- convergence rate
- learning tasks
- steady state
- supervised learning