Gradient Descent for General Reinforcement Learning.

Leemon C. Baird III Andrew W. Moore

Published in: NIPS (1998)

Keyphrases

reinforcement learning
special case
closely related
data sets
multi agent
genetic algorithm
learning algorithm
cost function
function approximation
neural network
real world
objective function
optimal policy
markov decision processes
model free
temporal difference