Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes.
Zihan ZhangQiaomin XiePublished in: CoRR (2023)
Keyphrases
- average reward
- markov decision processes
- policy gradient
- optimal policy
- reinforcement learning
- semi markov decision processes
- policy iteration
- state space
- finite state
- stochastic games
- optimality criterion
- reinforcement learning algorithms
- state action
- dynamic programming
- discounted reward
- total reward
- average cost
- reward function
- long run
- state and action spaces
- hierarchical reinforcement learning
- machine learning
- partially observable markov decision processes
- state abstraction
- rl algorithms
- markov chain
- markov decision process
- partially observable