Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning.
Ming YinYu BaiYu-Xiang WangPublished in: AISTATS (2021)
Keyphrases
- policy evaluation
- uniform convergence
- reinforcement learning
- temporal difference
- least squares
- model free
- sufficient conditions
- markov decision processes
- learning rate
- monte carlo
- function approximation
- policy iteration
- variance reduction
- state space
- generalization bounds
- optimal policy
- vc dimension
- generalization error
- real valued
- learning algorithm
- sample complexity
- large deviations
- evaluation function
- sample size
- dynamic programming
- supervised learning
- machine learning
- optimal control
- semi parametric
- learning process
- training data