Reinforcement learning using expectation maximization based guided policy search for stochastic dynamics.
Prakash MallickZhiyiong ChenMohsen ZamaniPublished in: Neurocomputing (2022)
Keyphrases
- policy search
- reinforcement learning
- expectation maximization
- em algorithm
- continuous state
- reinforcement learning algorithms
- control policies
- dynamical systems
- dynamic programming
- probabilistic model
- maximum likelihood
- partially observable markov decision processes
- reward function
- continuous action
- policy gradient
- state dependent
- function approximation
- generative model
- temporal difference
- monte carlo methods
- state space
- image segmentation
- partially observable
- machine learning
- function approximators
- markov decision processes
- monte carlo