Adaptive computation of optimal nonrandomized policies in constrained average-reward MDPs.
Eugene A. FeinbergPublished in: ADPRL (2009)
Keyphrases
- average reward
- optimal policy
- markov decision processes
- discounted reward
- total reward
- optimality criterion
- long run
- discount factor
- reinforcement learning
- dynamic programming
- policy iteration
- semi markov decision processes
- state space
- hierarchical reinforcement learning
- finite horizon
- markov decision process
- decision problems
- reward function
- model free
- average cost
- partially observable markov decision processes
- stochastic games
- state and action spaces
- semi markov decision process
- finite state
- state action
- infinite horizon
- multistage
- decision processes
- stationary policies
- control policy
- partially observable
- factored mdps
- reinforcement learning algorithms
- markov chain
- least squares
- optimal solution