COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation.
Jongmin LeeCosmin PaduraruDaniel J. MankowitzNicolas HeessDoina PrecupKee-Eung KimArthur GuezPublished in: ICLR (2022)
Keyphrases
- stationary distribution
- reinforcement learning
- markov chain
- random walk
- product form
- initial state
- sufficient conditions
- queueing networks
- state space
- queue length
- transition probabilities
- optimal policy
- state dependent
- queueing model
- service rates
- markov decision processes
- steady state
- multi agent
- finite state
- service times
- markov model
- learning algorithm