Optimal Sample Complexity for Average Reward Markov Decision Processes.
Shengbo WangJose H. BlanchetPeter W. GlynnPublished in: CoRR (2023)
Keyphrases
- average reward
- markov decision processes
- sample complexity
- discounted reward
- optimality criterion
- optimal policy
- policy iteration
- dynamic programming
- long run
- semi markov decision processes
- total reward
- reinforcement learning
- state space
- learning problems
- finite state
- learning algorithm
- state and action spaces
- active learning
- reinforcement learning algorithms
- special case
- average cost
- sample size
- supervised learning
- upper bound
- model free
- discount factor
- markov chain
- hierarchical reinforcement learning
- infinite horizon
- partially observable
- reward function
- action space
- stationary policies
- partially observable markov decision processes
- decision problems