Reward Estimation for Variance Reduction in Deep Reinforcement Learning.

Joshua Romoff Peter Henderson Alexandre Piché Vincent François-Lavet Joelle Pineau

Published in: CoRL (2018)

Keyphrases

reinforcement learning
variance reduction
policy gradient
gradient estimation
importance sampling
policy evaluation
monte carlo
sample size
function approximation
state space
markov decision processes
bias variance decomposition
quasi monte carlo
model free
machine learning
reward function
temporal difference
learning agent
reinforcement learning algorithms
optimal policy
supervised learning
state action
confidence intervals
training data