Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning.
Shentao YangYihao FengShujian ZhangMingyuan ZhouPublished in: ICML (2022)
Keyphrases
- stationary distribution
- reinforcement learning
- optimal policy
- state dependent
- initial state
- markov chain
- model free
- random walk
- state space
- policy search
- queueing networks
- product form
- action selection
- reward function
- service rates
- markov decision processes
- function approximators
- transition probabilities
- sufficient conditions
- function approximation
- queue length
- partially observable
- queueing model
- decision problems
- asymptotically optimal
- long run
- partially observable markov decision processes
- dynamic programming
- infinite horizon
- service times
- steady state
- multistage
- policy gradient
- finite state
- transfer learning
- machine learning
- special case
- neural network