Online learning control based on projected gradient temporal difference and advanced heuristic dynamic programming.
Jian FuSujuan WeiHaibo HeShengyong WangPublished in: IJCNN (2014)
Keyphrases
- online learning
- dynamic programming
- temporal difference
- reinforcement learning
- td learning
- evaluation function
- function approximation
- action selection
- monte carlo
- e learning
- state space
- model free
- step size
- markov decision processes
- optimal policy
- least squares
- optimal solution
- training data
- search space
- search algorithm