Frequentist Regret Bounds for Randomized Least-Squares Value Iteration.

Andrea Zanette David Brandfonbrener Matteo Pirotta Alessandro Lazaric

Published in: CoRR (2019)

Keyphrases

least squares
regret bounds
linear regression
policy iteration
markov decision processes
state space
optimal policy
lower bound
multi armed bandit
parameter estimation
online learning
optical flow
dynamic programming
upper bound
online convex optimization
efficient algorithms for solving
infinite horizon
linear predictors
markov chain
probability distribution
markov decision process
e learning
machine learning