The Optimal Reward Baseline for Gradient-Based Reinforcement Learning.

Lex Weaver Nigel Tao

Published in: UAI (2001)

Keyphrases

reinforcement learning
dynamic programming
optimal control
function approximation
control policy
average reward
optimal solution
state space
total reward
model free
supervised learning
machine learning
optimal policy
markov decision processes
learning process
reinforcement learning algorithms
learning capabilities
lower bound
relative improvement
worst case
long run
temporal difference
learning algorithm
eligibility traces