Regret Bounds for Reinforcement Learning via Markov Chain Concentration.
Ronald OrtnerPublished in: CoRR (2018)
Keyphrases
- markov chain
- reinforcement learning
- state space
- regret bounds
- finite state
- stationary distribution
- transition probabilities
- random walk
- monte carlo
- monte carlo method
- lower bound
- online learning
- dynamic programming
- optimal policy
- markov decision processes
- transition matrix
- supervised learning
- linear regression
- policy iteration
- search space
- learning algorithm
- random variables
- maximum entropy
- multi class
- upper bound
- bregman divergences
- machine learning