Optimal Sample Complexity for Average Reward Markov Decision Processes.
Shengbo WangJosé H. BlanchetPeter W. GlynnPublished in: ICLR (2024)
Keyphrases
- average reward
- markov decision processes
- sample complexity
- optimality criterion
- discounted reward
- optimal policy
- policy iteration
- long run
- dynamic programming
- average cost
- state space
- total reward
- semi markov decision processes
- reinforcement learning
- finite state
- learning problems
- infinite horizon
- learning algorithm
- model free
- state and action spaces
- special case
- lower bound
- upper bound
- supervised learning
- active learning
- hierarchical reinforcement learning
- reward function
- reinforcement learning algorithms
- discount factor
- machine learning
- stationary policies
- action space
- partially observable
- markov chain
- optimal control