Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control.
Kenny YoungBaoxiang WangMatthew E. TaylorPublished in: IJCAI (2019)
Keyphrases
- step size
- actor critic
- temporal difference
- approximate dynamic programming
- cost function
- gradient method
- reinforcement learning
- stochastic gradient descent
- optimal control
- convergence rate
- convergence speed
- reinforcement learning algorithms
- temporal difference learning
- control problems
- policy gradient
- function approximation
- control system
- policy iteration
- control strategy
- control method
- control policy
- objective function
- action selection
- markov decision processes
- monte carlo
- wavelet coefficients