Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes.
Emmeran JohnsonCiara Pike-BurkePatrick RebeschiniPublished in: CoRR (2023)
Keyphrases
- markov decision processes
- discount factor
- convergence rate
- optimal policy
- average reward
- policy iteration
- average cost
- learning rate
- finite horizon
- dynamic programming
- discounted reward
- infinite horizon
- total reward
- stationary policies
- finite state
- markov decision problems
- state space
- policy iteration algorithm
- partially observable
- reinforcement learning
- markov decision process
- optimality criterion
- step size
- state and action spaces
- long run
- action space
- transition matrices
- convergence speed
- reinforcement learning algorithms
- expected reward
- state dependent
- control policies
- planning under uncertainty
- reward function
- action sets
- continuous state spaces
- decision theoretic planning
- decision problems
- optimal solution
- decision processes