COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation.
Jongmin LeeCosmin PaduraruDaniel J. MankowitzNicolas HeessDoina PrecupKee-Eung KimArthur GuezPublished in: CoRR (2022)
Keyphrases
- stationary distribution
- reinforcement learning
- markov chain
- random walk
- product form
- queueing networks
- state space
- state dependent
- transition probabilities
- queueing model
- service times
- queue length
- multi agent
- service rates
- initial state
- sufficient conditions
- machine learning
- steady state
- maximum entropy
- finite state
- markov decision processes
- parameter estimation
- graphical models
- dynamic programming
- computational complexity
- learning algorithm