The Optimal Reward Baseline for Gradient-Based Reinforcement Learning
Lex WeaverNigel TaoPublished in: CoRR (2013)
Keyphrases
- reinforcement learning
- optimal control
- dynamic programming
- model free
- average reward
- total reward
- function approximation
- state space
- learning algorithm
- multi agent
- temporal difference
- approximate dynamic programming
- eligibility traces
- state action
- optimal policy
- reinforcement learning algorithms
- control policy
- worst case
- supervised learning
- objective function
- genetic algorithm
- data sets
- initially unknown
- reward shaping
- learning agent
- markov decision process
- action selection
- long run
- closed form
- learning process
- machine learning