Uncorrected Least-Squares Temporal Difference with Lambda-Return.
Takayuki OsogamiPublished in: AAAI (2020)
Keyphrases
- temporal difference
- least squares
- policy evaluation
- td learning
- reinforcement learning
- evaluation function
- function approximation
- step size
- monte carlo
- model free
- temporal difference learning
- policy iteration
- reinforcement learning algorithms
- temporal difference methods
- action selection
- fixed point
- actor critic
- optical flow
- cost function
- markov decision processes
- optimization algorithm
- neural network
- state space
- predictive state representations