Off-Policy Policy Gradient with Stationary Distribution Correction.
Yao LiuAdith SwaminathanAlekh AgarwalEmma BrunskillPublished in: UAI (2019)
Keyphrases
- stationary distribution
- policy gradient
- markov chain
- random walk
- reinforcement learning
- queueing networks
- function approximation
- optimal control
- gradient method
- initial state
- transition probabilities
- queue length
- state space
- reinforcement learning algorithms
- average reward
- sufficient conditions
- approximation methods
- steady state
- service times
- multi agent