An empirical algorithm for relative value iteration for average-cost MDPs.
Abhishek GuptaRahul JainPeter W. GlynnPublished in: CDC (2015)
Keyphrases
- dynamic programming
- markov decision processes
- learning algorithm
- computational complexity
- policy iteration
- optimal solution
- optimal policy
- linear programming
- state space
- cost function
- objective function
- average cost
- average reward
- np hard
- infinite horizon
- reinforcement learning
- machine learning
- multistage
- linear program
- long run
- markov decision process