Frequentist Regret Bounds for Randomized Least-Squares Value Iteration.
Andrea ZanetteDavid BrandfonbrenerEmma BrunskillMatteo PirottaAlessandro LazaricPublished in: AISTATS (2020)
Keyphrases
- least squares
- regret bounds
- linear regression
- policy iteration
- markov decision processes
- state space
- online learning
- parameter estimation
- multi armed bandit
- optimal policy
- lower bound
- upper bound
- dynamic programming
- optical flow
- online convex optimization
- computer vision
- infinite horizon
- markov chain
- linear predictors
- special case
- reinforcement learning