Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes.
Zihan ZhangQiaomin XiePublished in: COLT (2023)
Keyphrases
- average reward
- markov decision processes
- policy gradient
- optimal policy
- policy iteration
- reinforcement learning
- semi markov decision processes
- finite state
- state space
- stochastic games
- optimality criterion
- discounted reward
- dynamic programming
- average cost
- partially observable
- long run
- infinite horizon
- total reward
- reinforcement learning algorithms
- state action
- reward function
- state and action spaces
- hierarchical reinforcement learning
- model free
- decision problems
- heuristic search
- dynamic environments
- markov chain