Examining average and discounted reward optimality criteria in reinforcement learning.

Vektor Dewanto Marcus Gallagher

Published in: CoRR (2021)

Keyphrases

discounted reward
optimality criteria
reinforcement learning
markov decision processes
average reward
state and action spaces
policy iteration
optimal policy
hierarchical reinforcement learning
state space
model free
reinforcement learning algorithms
function approximation
markov decision problems
temporal difference
action space
average cost
dynamic programming
supervised learning
long run
fixed point
objective function
machine learning
reward function
markov chain