Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes.
Guanghui LanPublished in: Math. Program. (2023)
Keyphrases
- reinforcement learning
- optimal policy
- function approximators
- tractable cases
- policy search
- stochastic approximation
- linear complexity
- action selection
- decision problems
- function approximation
- markov decision process
- reinforcement learning problems
- partially observable
- approximate policy iteration
- state space
- markov decision problems
- convergence rate
- partially observable environments
- dynamic programming
- markov decision processes
- monte carlo
- average reward
- sample size
- state action
- proximal point
- sampling theorem
- complexity measures
- control policies
- field of view
- action space
- machine learning
- reinforcement learning algorithms
- reward function
- long run