Improved Exploration in Factored Average-Reward MDPs.
Mohammad Sadegh TalebiAnders JonssonOdalric-Ambrym MaillardPublished in: CoRR (2020)
Keyphrases
- average reward
- markov decision processes
- optimal policy
- long run
- state space
- semi markov decision processes
- discounted reward
- reinforcement learning
- stochastic games
- policy iteration
- optimality criterion
- model free
- state and action spaces
- finite state
- markov chain
- total reward
- state action
- sufficient conditions
- hierarchical reinforcement learning
- factored markov decision processes
- partially observable
- planning under uncertainty
- partially observable markov decision processes
- average cost
- queueing networks
- reward function
- planning problems
- dynamic programming
- search algorithm