Examining average and discounted reward optimality criteria in reinforcement learning.
Vektor DewantoMarcus GallagherPublished in: CoRR (2021)
Keyphrases
- discounted reward
- optimality criteria
- reinforcement learning
- markov decision processes
- average reward
- state and action spaces
- policy iteration
- optimal policy
- hierarchical reinforcement learning
- state space
- model free
- reinforcement learning algorithms
- function approximation
- markov decision problems
- temporal difference
- action space
- average cost
- dynamic programming
- supervised learning
- long run
- fixed point
- objective function
- machine learning
- reward function
- markov chain