Dynamic Policy Programming with Function Approximation.

Mohammad Gheshlaghi Azar Vicenç Gómez Bert Kappen

Published in: AISTATS (2011)

Keyphrases

function approximation
reinforcement learning
function approximators
temporal difference learning algorithms
temporal difference learning
reinforcement learning problems
policy evaluation
temporal difference
policy gradient
learning tasks
radial basis function
dynamic environments
optimal policy
model free
action selection
td learning
exploration exploitation tradeoff
data mining
reinforcement learning algorithms
artificial neural networks
learning algorithm
neural network