Gradient Q(σ, λ): A Unified Algorithm with Function Approximation for Reinforcement Learning.
Long YangYu ZhangQian ZhengPengfei LiGang PanPublished in: CoRR (2019)
Keyphrases
- function approximation
- reinforcement learning
- mountain car
- model free
- dynamic programming
- learning algorithm
- temporal difference learning
- temporal difference
- policy gradient
- function approximators
- neural network
- convergence rate
- actor critic
- search space
- gradient method
- td learning
- markov decision processes
- transfer learning
- step size