The Mirage of Action-Dependent Baselines in Reinforcement Learning.

George Tucker Surya Bhupatiraju Shixiang Gu Richard E. Turner Zoubin Ghahramani Sergey Levine

Published in: CoRR (2018)

Keyphrases

reinforcement learning
action selection
action space
reward shaping
partially observable domains
state action
function approximation
state space
reinforcement learning algorithms
markov decision processes
transition model
multi agent
learning algorithm
model free
continuous state
data sets
reinforcement learning methods
reasoning about actions
fitted q iteration
learning capabilities
optimal control
learning problems
model checking
optimal policy
learning process
spatio temporal
computer vision
information retrieval
neural network