A unified worst case for classical simplex and policy iteration pivot rules.
Yann DisserNils MosisPublished in: CoRR (2023)
Keyphrases
- policy iteration
- worst case
- markov decision processes
- simplex method
- fixed point
- least squares
- model free
- reinforcement learning
- sample path
- linear programming
- upper bound
- markov decision process
- optimal policy
- temporal difference
- finite state
- optimal control
- markov decision problems
- lower bound
- average reward
- neural network
- discounted reward
- policy evaluation
- infinite horizon
- linear program
- random walk
- cost function
- pairwise
- machine learning