Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPs.
Matthew ZurekYudong ChenPublished in: CoRR (2024)
Keyphrases
- average reward
- markov decision processes
- sample complexity
- optimal policy
- long run
- optimality criterion
- discounted reward
- special case
- reinforcement learning
- semi markov decision processes
- sequential decision problems
- policy iteration
- theoretical analysis
- dynamic programming
- learning problems
- average cost
- model free
- markov chain
- supervised learning
- state space
- active learning
- decision problems
- partially observable
- machine learning
- upper bound
- infinite horizon
- finite state
- least squares
- training data
- feature selection