Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning.
Shentao YangYihao FengShujian ZhangMingyuan ZhouPublished in: CoRR (2022)
Keyphrases
- stationary distribution
- reinforcement learning
- optimal policy
- state dependent
- markov chain
- initial state
- model free
- policy search
- state space
- random walk
- product form
- sufficient conditions
- action selection
- queue length
- queueing networks
- markov decision processes
- queueing model
- reward function
- service rates
- function approximation
- transition probabilities
- decision problems
- policy gradient
- partially observable
- function approximators
- infinite horizon
- long run
- dynamic programming
- asymptotically optimal
- neural network
- steady state
- learning algorithm
- machine learning
- markov model
- multistage
- average cost
- service times
- special case
- multi agent