Higher-Order and Average Reward Myopic-Affine Dynamic Models.
Matthew J. SobelPublished in: Math. Oper. Res. (1990)
Keyphrases
- dynamic model
- average reward
- higher order
- long run
- markov decision processes
- optimal policy
- infinite horizon
- semi markov decision processes
- reinforcement learning
- stochastic games
- discounted reward
- optimality criterion
- model free
- policy iteration
- experimental data
- pairwise
- markov chain
- markov random field
- state space
- multiple models
- total reward
- markov models
- knowledge base
- state and action spaces
- dynamic programming
- finite state
- policy gradient
- multi agent
- hierarchical reinforcement learning
- optimal control
- linear programming