Near-optimal Regret Bounds for Reinforcement Learning in Factored MDPs.
Ian OsbandBenjamin Van RoyPublished in: CoRR (2014)
Keyphrases
- factored mdps
- reinforcement learning
- markov decision processes
- state space
- approximate dynamic programming
- regret bounds
- policy iteration
- markov decision problems
- reinforcement learning algorithms
- context specific
- optimal policy
- transition model
- online learning
- function approximation
- learning algorithm
- dynamic programming
- model free
- markov decision process
- infinite horizon
- temporal difference
- action space
- reward function
- finite state
- linear program
- machine learning
- average cost
- linear regression
- finite state machines
- maximum likelihood
- partially observable
- optimal control
- stochastic processes
- partially observable markov decision processes
- step size
- supervised learning
- markov chain
- upper bound
- bregman divergences
- probabilistic model
- cost function
- special case
- dynamical systems