Learning Infinite-horizon Average-reward Markov Decision Process with Constraints.
Liyu ChenRahul JainHaipeng LuoPublished in: ICML (2022)
Keyphrases
- infinite horizon
- markov decision process
- optimal policy
- markov decision processes
- average reward
- long run
- policy iteration
- stochastic games
- reinforcement learning
- finite horizon
- state action
- dynamic programming
- hierarchical reinforcement learning
- partially observable
- learning algorithm
- optimal control
- multistage
- partially observable markov decision processes
- initial state
- reward function
- least squares
- state space