Computing Probabilistic Bisimilarity Distances via Policy Iteration.
Qiyi TangFranck van BreugelPublished in: CONCUR (2016)
Keyphrases
- policy iteration
- markov decision processes
- least squares
- model free
- reinforcement learning
- optimal policy
- fixed point
- sample path
- markov decision process
- infinite horizon
- bayesian networks
- finite state
- generative model
- probabilistic model
- optimal control
- dynamic programming
- markov chain
- supervised learning
- temporal difference
- initial state
- average reward
- decision making