From minimax value to low-regret algorithms for online Markov decision processes.
Peng GuanMaxim RaginskyRebecca WillettPublished in: ACC (2014)
Keyphrases
- markov decision processes
- policy iteration
- online algorithms
- online learning
- worst case
- finite state
- state space
- reachability analysis
- reinforcement learning
- factored mdps
- dynamic programming
- finite horizon
- optimal policy
- decision theoretic planning
- online convex optimization
- planning under uncertainty
- partially observable markov decision processes
- computational complexity
- expected reward
- policy evaluation
- stochastic shortest path
- partially observable
- reward function
- infinite horizon
- planning problems
- convergence rate
- multistage
- linear programming
- search space