DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections.
Ofir NachumYinlam ChowBo DaiLihong LiPublished in: CoRR (2019)
Keyphrases
- stationary distribution
- markov chain
- random walk
- product form
- queueing networks
- transition probabilities
- state dependent
- sufficient conditions
- markov decision processes
- initial state
- queueing model
- queue length
- finite state
- dynamic programming
- lot sizing
- optimal control
- latent variables
- steady state
- optimal policy
- service rates