Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning.
Seyed Sajad MousaviMichael SchukatPeter CorcoranEnda HowleyPublished in: CoRR (2017)
Keyphrases
- policy gradient
- reinforcement learning
- actor critic
- function approximation
- reinforcement learning algorithms
- policy search
- gradient method
- optimal control
- policy gradient methods
- function approximators
- state space
- approximation methods
- temporal difference
- model free reinforcement learning
- single agent
- reinforcement learning methods
- variance reduction
- learning algorithm
- machine learning
- partially observable markov decision processes
- model free
- markov decision processes
- average reward
- temporal difference learning
- action selection
- dynamical systems