Reward Estimation for Variance Reduction in Deep Reinforcement Learning.
Joshua RomoffAlexandre PichéPeter HendersonVincent François-LavetJoelle PineauPublished in: CoRR (2018)
Keyphrases
- reinforcement learning
- variance reduction
- policy gradient
- gradient estimation
- importance sampling
- policy evaluation
- monte carlo
- function approximation
- sample size
- markov decision processes
- bias variance decomposition
- reinforcement learning algorithms
- state space
- learning algorithm
- parameter estimation
- machine learning
- optimal policy
- model free
- dynamic programming
- reward function
- markov chain
- training data