Optimal Online Learning Procedures for Model-Free Policy Evaluation.
Tsuyoshi UenoShin-ichi MaedaMotoaki KawanabeShin IshiiPublished in: ECML/PKDD (2) (2009)
Keyphrases
- model free
- policy evaluation
- online learning
- reinforcement learning
- policy iteration
- temporal difference
- least squares
- average reward
- function approximation
- reinforcement learning algorithms
- markov decision processes
- dynamic programming
- monte carlo
- variance reduction
- optimal control
- worst case
- e learning
- state space