Relative value iteration for average reward semi-Markov control via simulation.
Abhijit GosaviPublished in: WSC (2013)
Keyphrases
- average reward
- semi markov
- markov decision processes
- optimal policy
- long run
- policy iteration
- reinforcement learning
- semi markov decision processes
- optimality criterion
- markov chain
- model free
- hierarchical reinforcement learning
- state space
- total reward
- discounted reward
- conditional random fields
- decision problems
- policy gradient
- state and action spaces
- optimal control
- partially observable markov decision processes
- function approximation
- control strategy
- mathematical model
- control system
- infinite horizon
- heuristic search
- dynamic programming
- pairwise
- computational complexity