Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning.
Ming YinYu BaiYu-Xiang WangPublished in: CoRR (2020)
Keyphrases
- policy evaluation
- uniform convergence
- reinforcement learning
- temporal difference
- least squares
- model free
- sufficient conditions
- learning rate
- markov decision processes
- monte carlo
- function approximation
- policy iteration
- learning algorithm
- generalization error
- optimal policy
- sample complexity
- vc dimension
- variance reduction
- state space
- real valued
- large deviations
- semi parametric
- generalization bounds
- machine learning
- learning problems
- learning process
- optimal control
- dynamic programming
- partially observable markov decision processes
- action selection
- supervised learning
- linear programming