A unified worst case for classical simplex and policy iteration pivot rules.

Yann Disser Nils Mosis

Published in: CoRR (2023)

Keyphrases

policy iteration
worst case
markov decision processes
simplex method
fixed point
least squares
model free
reinforcement learning
sample path
linear programming
upper bound
markov decision process
optimal policy
temporal difference
finite state
optimal control
markov decision problems
lower bound
average reward
neural network
discounted reward
policy evaluation
infinite horizon
linear program
random walk
cost function
pairwise
machine learning