Frequentist Regret Bounds for Randomized Least-Squares Value Iteration.
Andrea ZanetteDavid BrandfonbrenerMatteo PirottaAlessandro LazaricPublished in: CoRR (2019)
Keyphrases
- least squares
- regret bounds
- linear regression
- policy iteration
- markov decision processes
- state space
- optimal policy
- lower bound
- multi armed bandit
- parameter estimation
- online learning
- optical flow
- dynamic programming
- upper bound
- online convex optimization
- efficient algorithms for solving
- infinite horizon
- linear predictors
- markov chain
- probability distribution
- markov decision process
- e learning
- machine learning