Regret Bounds for Reinforcement Learning via Markov Chain Concentration.
Ronald OrtnerPublished in: J. Artif. Intell. Res. (2020)
Keyphrases
- markov chain
- reinforcement learning
- state space
- regret bounds
- transition probabilities
- finite state
- monte carlo
- monte carlo method
- stationary distribution
- random walk
- transition matrix
- markov decision processes
- learning algorithm
- online learning
- upper bound
- markov decision process
- optimal policy
- linear regression
- lower bound
- dynamic programming
- reward function
- bayesian networks
- feature selection