Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning.
Ming YinYu-Xiang WangPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- policy evaluation
- temporal difference
- model free
- function approximation
- least squares
- markov decision processes
- policy iteration
- monte carlo
- state space
- variance reduction
- learning algorithm
- reinforcement learning algorithms
- dynamic programming
- td learning
- graphical models
- sample size
- average reward