Value Iteration for Long-run Average Reward in Markov Decision Processes.
Pranav AshokKrishnendu ChatterjeePrzemyslaw DacaJan KretínskýTobias MeggendorferPublished in: CoRR (2017)
Keyphrases
- average reward
- long run
- markov decision processes
- optimal policy
- semi markov decision processes
- infinite horizon
- policy iteration
- stochastic games
- discounted reward
- finite state
- expected cost
- state space
- discount factor
- finite horizon
- optimality criterion
- reinforcement learning
- average cost
- queueing networks
- dynamic programming
- sample path
- total reward
- state dependent
- state and action spaces
- decision problems
- partially observable
- markov decision process
- policy gradient
- decision processes
- sufficient conditions
- state abstraction
- planning under uncertainty
- partially observable markov decision processes
- state variables
- markov chain
- stochastic shortest path