Optimizing the Long-Term Average Reward for Continuing MDPs: A Technical Report.
Chao XuYiping XieXijun WangHoward H. YangDusit NiyatoTony Q. S. QuekPublished in: CoRR (2021)
Keyphrases
- average reward
- technical report
- long term
- markov decision processes
- optimal policy
- semi markov decision processes
- long run
- reinforcement learning
- discounted reward
- policy iteration
- optimality criterion
- state and action spaces
- model free
- stochastic games
- markov chain
- finite state
- discount factor
- state space
- total reward
- average cost
- factored mdps
- policy gradient
- hierarchical reinforcement learning
- dynamic programming
- infinite horizon
- decision problems
- learning algorithm