Near-Optimal Offline Reinforcement Learning via Double Variance Reduction.
Ming YinYu BaiYu-Xiang WangPublished in: NeurIPS (2021)
Keyphrases
- variance reduction
- reinforcement learning
- gradient estimation
- policy gradient
- monte carlo
- sample size
- random numbers
- bias variance decomposition
- importance sampling
- function approximation
- state space
- confidence intervals
- dynamic programming
- trade off
- learning algorithm
- naive bayes classifier
- upper bound
- support vector machine