An Adaptive Policy Evaluation Network Based on Recursive Least Squares Temporal Difference With Gradient Correction.
Dazi LiYuting WangTianheng SongQibing JinPublished in: IEEE Access (2018)
Keyphrases
- temporal difference
- policy evaluation
- recursive least squares
- gradient method
- step size
- td learning
- reinforcement learning
- function approximation
- monte carlo
- evaluation function
- neuro fuzzy
- policy gradient
- model free
- convergence rate
- policy iteration
- least squares
- convergence speed
- reinforcement learning algorithms
- cost function
- action selection
- variance reduction
- wavelet coefficients
- kalman filter
- function approximators
- dynamic programming
- supervised learning
- multiscale
- markov chain
- feature selection
- markov decision processes
- differential evolution