The Mirage of Action-Dependent Baselines in Reinforcement Learning.
George TuckerSurya BhupatirajuShixiang GuRichard E. TurnerZoubin GhahramaniSergey LevinePublished in: ICLR (Workshop) (2018)
Keyphrases
- reinforcement learning
- action selection
- partially observable domains
- action space
- state action
- reward shaping
- function approximation
- state space
- model free
- transition model
- temporal difference
- neural network
- action sequences
- multi agent
- machine learning
- dynamic programming
- least squares
- supervised learning
- transfer learning
- spatio temporal
- reinforcement learning algorithms
- multi agent systems
- temporal difference learning
- information retrieval