Near-Optimal Offline Reinforcement Learning via Double Variance Reduction.
Ming YinYu BaiYu-Xiang WangPublished in: CoRR (2021)
Keyphrases
- variance reduction
- reinforcement learning
- policy gradient
- monte carlo
- gradient estimation
- sample size
- random numbers
- bias variance decomposition
- importance sampling
- function approximation
- reinforcement learning algorithms
- quasi monte carlo
- decision trees
- state space
- learning algorithm
- confidence intervals
- dynamic programming
- graphical models
- actor critic
- supervised learning