On the model-based stochastic value gradient for continuous reinforcement learning.
Brandon AmosSamuel StantonDenis YaratsAndrew Gordon WilsonPublished in: L4DC (2021)
Keyphrases
- reinforcement learning
- continuous state spaces
- direct policy search
- model free
- state space
- optimal control problems
- stochastic approximation
- function approximation
- optimal control
- action space
- continuous domains
- policy gradient
- continuous state
- fitted q iteration
- learning automata
- monte carlo
- learning algorithm
- dynamic programming
- control policies
- stochastic optimization
- reinforcement learning algorithms
- markov decision processes
- temporal difference learning
- piecewise constant
- multi agent
- data driven
- temporal difference
- image processing
- continuous state and action spaces
- robotic control
- bayesian networks
- gradient direction
- partially observable
- action selection
- gradient method
- fully unsupervised
- learning tasks