Lower Bounds for Policy Iteration on Multi-action MDPs.
Kumar AshutoshSarthak ConsulBhishma DedhiaParthasarathi KhirwadkarSahil ShahShivaram KalyanakrishnanPublished in: CoRR (2020)
Keyphrases
- policy iteration
- markov decision processes
- lower bound
- discounted reward
- optimal policy
- fixed point
- reinforcement learning
- factored mdps
- action space
- average reward
- upper bound
- sample path
- model free
- markov decision process
- least squares
- state space
- approximate dynamic programming
- finite state
- objective function
- temporal difference
- policy evaluation
- markov decision problems
- transition matrices
- initial state
- average cost
- linear programming
- infinite horizon
- convergence rate
- np hard
- dynamic programming
- action selection
- decision problems
- finite horizon
- optimal control
- state and action spaces
- optimal solution
- reinforcement learning algorithms