Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method.
Sébastien GrosMario ZanonPublished in: ACC (2021)
Keyphrases
- gradient method
- actor critic
- policy gradient
- reinforcement learning
- convergence rate
- control policies
- policy search
- negative matrix factorization
- optimal policy
- convex formulation
- optimization methods
- step size
- continuous state spaces
- action selection
- function approximation
- state space
- control policy
- action space
- convergence speed
- approximate dynamic programming
- information retrieval systems
- dynamic programming