Towards Safe Reinforcement Learning Using NMPC and Policy Gradients: Part I - Stochastic case.
Sebastien GrosMario ZanonPublished in: CoRR (2019)
Keyphrases
- reinforcement learning
- optimal policy
- control policies
- markov decision process
- direct policy search
- state space
- action selection
- multi agent
- reinforcement learning algorithms
- policy search
- continuous state spaces
- control policy
- action space
- policy iteration
- infinite horizon
- function approximation
- markov decision processes
- robot control
- reward function
- temporal difference
- model free
- policy gradient
- clustering scheme
- decision process
- model free reinforcement learning