Lower Bounds for Policy Iteration on Multi-action MDPs.
Kumar AshutoshSarthak ConsulBhishma DedhiaParthasarathi KhirwadkarSahil ShahShivaram KalyanakrishnanPublished in: CDC (2020)
Keyphrases
- policy iteration
- markov decision processes
- lower bound
- discounted reward
- model free
- optimal policy
- reinforcement learning
- upper bound
- sample path
- factored mdps
- average reward
- fixed point
- markov decision process
- action space
- temporal difference
- least squares
- approximate dynamic programming
- policy evaluation
- finite state
- state space
- markov decision problems
- average cost
- infinite horizon
- transition matrices
- dynamic programming
- finite horizon
- convergence rate
- objective function
- partially observable
- optimal control
- initial state
- action selection
- np hard
- linear programming
- decision making
- reinforcement learning algorithms
- multi agent
- evaluation function
- context specific
- optimal solution
- graphical models
- state and action spaces
- approximate policy iteration