On the model-based stochastic value gradient for continuous reinforcement learning.
Brandon AmosSamuel StantonDenis YaratsAndrew Gordon WilsonPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- continuous state spaces
- model free
- direct policy search
- optimal control problems
- optimal control
- state space
- action space
- stochastic approximation
- learning automata
- policy gradient
- continuous domains
- function approximation
- reinforcement learning algorithms
- multi agent
- control policies
- monte carlo
- machine learning
- markov decision processes
- temporal difference
- transfer learning
- control problems
- continuous state and action spaces
- dynamic programming
- approximate dynamic programming
- fitted q iteration
- continuous state
- temporal difference learning
- piecewise constant
- data driven
- supervised learning
- learning algorithm