Temporal Difference Methods for the Variance of the Reward To Go.
Aviv TamarDotan Di CastroShie MannorPublished in: ICML (3) (2013)
Keyphrases
- temporal difference methods
- reinforcement learning
- function approximation
- temporal difference
- policy search
- reinforcement learning problems
- evolutionary methods
- policy evaluation
- reinforcement learning algorithms
- function approximators
- reward function
- variance reduction
- long run
- policy gradient
- evolutionary computation
- optimization algorithm
- neural network
- average reward
- optimal control
- decision making
- machine learning