Online Markov decision processes with Kullback-Leibler control cost.
Peng GuanMaxim RaginskyRebecca WillettPublished in: CoRR (2014)
Keyphrases
- markov decision processes
- kullback leibler
- average cost
- state space
- optimal policy
- finite state
- kl divergence
- reinforcement learning
- cross entropy
- transition matrices
- kullback leibler divergence
- optimal control
- infinite horizon
- decision theoretic planning
- policy iteration
- distance measure
- dynamic programming
- reward function
- real time dynamic programming
- total cost
- control strategy
- markov decision process
- information theoretic
- stochastic shortest path
- average reward