Online regret bounds for Markov decision processes with deterministic transitions.
Ronald OrtnerPublished in: Theor. Comput. Sci. (2010)
Keyphrases
- markov decision processes
- regret bounds
- online learning
- finite state
- transition matrices
- state space
- optimal policy
- dynamic programming
- reinforcement learning
- policy iteration
- decision theoretic planning
- online convex optimization
- average reward
- stationary policies
- infinite horizon
- reward function
- average cost
- lower bound
- action space
- partially observable
- linear regression
- monte carlo
- support vector machine
- stochastic shortest path