Policy Gradient With Serial Markov Chain Reasoning.
Edoardo CetinOya ÇeliktutanPublished in: NeurIPS (2022)
Keyphrases
- markov chain
- policy gradient
- finite state
- average reward
- monte carlo
- transition probabilities
- stationary distribution
- random walk
- markov model
- state space
- reinforcement learning
- variance reduction
- monte carlo method
- state transition
- partially observable markov decision processes
- transition matrix
- approximation methods
- optimal control
- reinforcement learning algorithms
- importance sampling
- gradient method
- markov chain monte carlo
- function approximation
- decision problems