Finding good policies in average-reward Markov Decision Processes without prior knowledge.
Adrienne TuynmanRémy DegenneEmilie KaufmannPublished in: CoRR (2024)
Keyphrases
- markov decision processes
- average reward
- optimal policy
- discounted reward
- prior knowledge
- total reward
- markov decision process
- policy iteration
- semi markov decision processes
- optimality criterion
- state space
- dynamic programming
- discount factor
- long run
- finite state
- decision processes
- reinforcement learning
- planning under uncertainty
- reward function
- hierarchical reinforcement learning
- average cost
- partially observable markov decision processes
- decision problems
- stochastic games
- reinforcement learning algorithms
- decision theoretic planning
- infinite horizon
- stationary policies
- partially observable
- state and action spaces
- sufficient conditions
- markov decision problems
- factored mdps
- markov chain
- expected reward
- actor critic
- action space