Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes.
Guanghui LanPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- function approximators
- action selection
- tractable cases
- decision problems
- function approximation
- stochastic approximation
- linear complexity
- worst case
- reinforcement learning algorithms
- reinforcement learning problems
- approximate policy iteration
- complexity measures
- action space
- markov decision problems
- temporal difference
- convergence rate
- policy evaluation
- markov decision process
- state space
- computational complexity
- learning algorithm
- control policy
- sampling methods
- reward function
- continuous state
- model free
- convergence speed
- approximate dynamic programming
- markov decision processes
- monte carlo
- sampling theorem
- agent receives
- proximal point