Improved Exploration in Factored Average-Reward MDPs.
Mohammad Sadegh TalebiAnders JonssonOdalric MaillardPublished in: AISTATS (2021)
Keyphrases
- average reward
- markov decision processes
- optimal policy
- long run
- semi markov decision processes
- state space
- policy iteration
- discounted reward
- reinforcement learning
- stochastic games
- dynamic programming
- optimality criterion
- state and action spaces
- model free
- finite state
- state action
- markov chain
- infinite horizon
- total reward
- average cost
- factored mdps
- reinforcement learning algorithms
- partially observable markov decision processes
- hierarchical reinforcement learning
- reward function
- decision problems
- random walk