Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view

Published in: CoRR (2010)

Keyphrases

temporal difference
policy iteration
policy evaluation
fixpoint
td learning
reinforcement learning
evaluation function
function approximation
model free
action selection
monte carlo
logic programs
step size
sample path
markov decision processes
least squares
reinforcement learning algorithms
deductive databases
convergence rate
fixed point
supervised learning
average reward
genetic algorithm
learning tasks
decision making
optimal policy
function approximators
markov decision problems
search space
learning algorithm