Reward Estimation for Variance Reduction in Deep Reinforcement Learning.
Joshua RomoffAlexandre PichéPeter HendersonVincent François-LavetJoelle PineauPublished in: ICLR (Workshop) (2018)
Keyphrases
- reinforcement learning
- variance reduction
- policy gradient
- gradient estimation
- importance sampling
- policy evaluation
- monte carlo
- sample size
- temporal difference
- quasi monte carlo
- bias variance decomposition
- function approximation
- state space
- reinforcement learning algorithms
- markov decision processes
- reward function
- model free
- dynamic programming
- learning algorithm
- machine learning