Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP.
Jinghan WangMengdi WangLin F. YangPublished in: CoRR (2022)
Keyphrases
- average reward
- optimal policy
- markov decision processes
- reinforcement learning
- stochastic games
- state action
- discounted reward
- optimality criterion
- actor critic
- long run
- learning algorithm
- policy gradient
- total reward
- action selection
- model free
- partially observable
- average cost
- hierarchical reinforcement learning
- supervised learning
- state space
- td learning
- inverse reinforcement learning
- semi markov decision processes
- rl algorithms
- policy iteration
- partially observable markov decision processes
- decision theoretic
- markov chain
- dynamic programming
- optimal solution