Keyphrases
- optimal policy
- markov decision processes
- infinite horizon
- finite horizon
- average reward
- decision problems
- reinforcement learning
- state space
- dynamic programming
- long run
- average cost
- finite state
- markov decision process
- multistage
- sufficient conditions
- dynamic programming algorithms
- control policies
- state dependent
- total reward
- policy iteration
- semi markov decision processes
- partially observable
- discount factor
- average reward reinforcement learning
- discounted reward
- inventory control
- partially observable markov decision processes
- model free