Combining policy gradient and Q-learning.
Brendan O'DonoghueRémi MunosKoray KavukcuogluVolodymyr MnihPublished in: ICLR (Poster) (2017)
Keyphrases
- policy gradient
- reinforcement learning
- function approximation
- actor critic
- reinforcement learning algorithms
- model free reinforcement learning
- state action
- single agent
- state space
- cooperative
- multi agent
- learning algorithm
- dynamic programming
- temporal difference
- optimal control
- function approximators
- policy search
- reinforcement learning methods
- temporal difference learning
- gradient method
- learning tasks
- model free