Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view
Bruno ScherrerPublished in: CoRR (2010)
Keyphrases
- temporal difference
- policy iteration
- policy evaluation
- fixpoint
- td learning
- reinforcement learning
- evaluation function
- function approximation
- model free
- action selection
- monte carlo
- logic programs
- step size
- sample path
- markov decision processes
- least squares
- reinforcement learning algorithms
- deductive databases
- convergence rate
- fixed point
- supervised learning
- average reward
- genetic algorithm
- learning tasks
- decision making
- optimal policy
- function approximators
- markov decision problems
- search space
- learning algorithm