Differential TD learning for value function approximation.
Adithya M. DevrajSean P. MeynPublished in: CDC (2016)
Keyphrases
- temporal difference
- td learning
- evaluation function
- reinforcement learning
- function approximation
- monte carlo
- model free
- step size
- action selection
- reinforcement learning algorithms
- policy evaluation
- policy iteration
- markov decision processes
- neural network
- markov chain
- particle swarm optimization
- function approximators
- decision trees