Optimal Sample Complexity of Reinforcement Learning for Uniformly Ergodic Discounted Markov Decision Processes.
Shengbo WangJose BlanchetPeter W. GlynnPublished in: CoRR (2023)
Keyphrases
- markov decision processes
- sample complexity
- reinforcement learning
- average cost
- dynamic programming
- average reward
- optimal policy
- learning problems
- state space
- finite horizon
- action sets
- policy iteration
- reinforcement learning algorithms
- finite state
- total reward
- stationary policies
- theoretical analysis
- discounted reward
- upper bound
- learning algorithm
- supervised learning
- markov decision process
- partially observable
- infinite horizon
- model based reinforcement learning
- state and action spaces
- active learning
- optimal control
- special case
- markov chain
- action space
- model free
- reward function
- function approximation
- lower bound
- partially observable markov decision processes
- sample size
- concept learning
- worst case
- continuous state
- state abstraction
- optimal solution
- machine learning
- decision problems
- generalization error
- multi agent