The Actor-Advisor: Policy Gradient With Off-Policy Advice.
Hélène PlisnierDenis SteckelmacherDiederik M. RoijersAnn NowéPublished in: CoRR (2019)
Keyphrases
- partially observable markov decision processes
- policy gradient
- reinforcement learning
- actor critic
- state space
- parametric optimization
- model free reinforcement learning
- reinforcement learning algorithms
- average reward
- computational complexity
- cost function
- sufficient conditions
- function approximation
- optimal control
- single agent