Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with √T Regret.
Asaf CasselTomer KorenPublished in: ICML (2021)
Keyphrases
- reinforcement learning
- model free
- policy gradient
- reinforcement learning algorithms
- learning process
- function approximation
- supervised learning
- learning problems
- learning tasks
- learning algorithm
- markov decision processes
- average reward
- reinforcement learning methods
- rl algorithms
- dynamical systems
- policy iteration
- stochastic games
- variance reduction