Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes.
Chen-Yu WeiMehdi Jafarnia-JahromiHaipeng LuoHiteshi SharmaRahul JainPublished in: ICML (2020)
Keyphrases
- markov decision processes
- infinite horizon
- average reward
- policy gradient
- optimal policy
- finite horizon
- policy iteration
- finite state
- state space
- dynamic programming
- reinforcement learning
- stochastic games
- state action
- discounted reward
- average cost
- reinforcement learning algorithms
- markov decision process
- partially observable
- decision problems
- planning under uncertainty
- total reward
- reward function
- partially observable markov decision processes
- long run
- markov decision problems
- dec pomdps
- discount factor
- inventory level
- heuristic search
- sufficient conditions
- state variables
- optimal control
- stationary policies
- multistage
- multi agent