Towards Safe Reinforcement Learning Using NMPC and Policy Gradients: Part II - Deterministic Case.
Sebastien GrosMario ZanonPublished in: CoRR (2019)
Keyphrases
- reinforcement learning
- optimal policy
- markov decision process
- action selection
- state space
- function approximation
- policy search
- markov decision processes
- partially observable domains
- reward function
- markov decision problems
- learning algorithm
- policy iteration
- function approximators
- model free
- policy gradient
- multi agent
- inverse reinforcement learning
- deterministic domains
- computer vision