Online Meta-Critic Learning for Off-Policy Actor-Critic Methods.
Wei ZhouYiying LiYongxin YangHuaimin WangTimothy M. HospedalesPublished in: NeurIPS (2020)
Keyphrases
- actor critic
- reinforcement learning
- gradient method
- learning algorithm
- policy gradient
- markov decision processes
- function approximation
- reinforcement learning algorithms
- policy iteration
- active learning
- approximate dynamic programming
- neural network
- learning tasks
- optimization methods
- optimal control
- state space
- temporal difference learning
- reinforcement learning methods