The Mirage of Action-Dependent Baselines in Reinforcement Learning.

George Tucker Surya Bhupatiraju Shixiang Gu Richard E. Turner Zoubin Ghahramani Sergey Levine

Published in: ICML (2018)

Keyphrases

reinforcement learning
action selection
partially observable domains
reward shaping
transition model
state action
action space
reinforcement learning algorithms
function approximation
learning algorithm
temporal difference
sensory inputs
state space
fitted q iteration
optimal control
partially observable
markov decision processes
learning process
action models
optimal policy
multi agent
policy search
computer vision
function approximators
temporal difference learning
continuous state
action descriptions
robotic control
machine learning
continuous action