The Mirage of Action-Dependent Baselines in Reinforcement Learning.
George TuckerSurya BhupatirajuShixiang GuRichard E. TurnerZoubin GhahramaniSergey LevinePublished in: CoRR (2018)
Keyphrases
- reinforcement learning
- action selection
- action space
- reward shaping
- partially observable domains
- state action
- function approximation
- state space
- reinforcement learning algorithms
- markov decision processes
- transition model
- multi agent
- learning algorithm
- model free
- continuous state
- data sets
- reinforcement learning methods
- reasoning about actions
- fitted q iteration
- learning capabilities
- optimal control
- learning problems
- model checking
- optimal policy
- learning process
- spatio temporal
- computer vision
- information retrieval
- neural network