On occupation measures for total-reward MDPs.
Eric V. DenardoEugene A. FeinbergUriel G. RothblumPublished in: CDC (2008)
Keyphrases
- total reward
- markov decision processes
- average reward
- optimal policy
- reinforcement learning
- state space
- reinforcement learning algorithms
- policy iteration
- finite state
- average cost
- infinite horizon
- action selection
- action space
- stationary policies
- long run
- partially observable
- evaluation function
- optimality criterion
- decision problems