The Smoothed Complexity of Policy Iteration for Markov Decision Processes.
Miranda ChristMihalis YannakakisPublished in: CoRR (2022)
Keyphrases
- markov decision processes
- policy iteration
- finite state
- optimal policy
- reinforcement learning
- average reward
- transition matrices
- dynamic programming
- state space
- factored mdps
- model free
- average cost
- sample path
- approximate dynamic programming
- partially observable
- reinforcement learning algorithms
- infinite horizon
- planning under uncertainty
- discounted reward
- fixed point
- markov decision process
- finite horizon
- action space
- policy evaluation
- least squares
- state and action spaces
- actor critic
- policy iteration algorithm
- decision problems
- convergence rate
- long run
- decision processes
- temporal difference
- reward function