State-Visitation Fairness in Average-Reward MDPs.
Ganesh GhalmeVineet NairVishakha PatilYilun ZhouPublished in: CoRR (2021)
Keyphrases
- average reward
- markov decision processes
- discounted reward
- state space
- optimal policy
- state action
- long run
- semi markov decision processes
- total reward
- reinforcement learning
- policy iteration
- discount factor
- stochastic games
- action space
- optimality criterion
- model free
- markov decision process
- hierarchical reinforcement learning
- real time dynamic programming
- state and action spaces
- initial state
- game theory
- linear programming
- dynamic programming