Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints.
Liyu ChenRahul JainHaipeng LuoPublished in: CoRR (2022)
Keyphrases
- markov decision processes
- infinite horizon
- average reward
- optimal policy
- stochastic games
- reinforcement learning
- partially observable
- policy iteration
- long run
- finite horizon
- finite state
- dynamic programming
- average cost
- state space
- discounted reward
- markov decision process
- semi markov decision processes
- actor critic
- state action
- total reward
- state abstraction
- learning algorithm
- reinforcement learning algorithms
- markov decision problems
- decision problems
- discount factor
- policy gradient
- probability distribution
- multistage
- search algorithm
- multi agent
- optimality criterion
- partially observable markov decision processes
- decision theoretic
- sufficient conditions