What is an Optimal Policy in Time-Average MDP?
Nicolas GastBruno GaujalKimang KhunPublished in: SIGMETRICS Perform. Evaluation Rev. (2023)
Keyphrases
- optimal policy
- average cost
- markov decision processes
- markov decision process
- state space
- decision problems
- dynamic programming
- finite state
- reinforcement learning
- finite horizon
- discounted reward
- infinite horizon
- multistage
- bayesian reinforcement learning
- long run
- average reward
- sufficient conditions
- state dependent
- initial state
- policy iteration
- control policies
- standard deviation
- dynamic programming algorithms
- reward function
- serial inventory systems
- stationary policies
- total reward
- partially observable