Reward Estimation for Variance Reduction in Deep Reinforcement Learning.
Joshua RomoffPeter HendersonAlexandre PichéVincent François-LavetJoelle PineauPublished in: CoRL (2018)
Keyphrases
- reinforcement learning
- variance reduction
- policy gradient
- gradient estimation
- importance sampling
- policy evaluation
- monte carlo
- sample size
- function approximation
- state space
- markov decision processes
- bias variance decomposition
- quasi monte carlo
- model free
- machine learning
- reward function
- temporal difference
- learning agent
- reinforcement learning algorithms
- optimal policy
- supervised learning
- state action
- confidence intervals
- training data