Login / Signup
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency.
Heyang Zhao
Jiafan He
Dongruo Zhou
Tong Zhang
Quanquan Gu
Published in:
CoRR (2023)
Keyphrases
</>
computational efficiency
regret bounds
reinforcement learning
multi armed bandit
lower bound
online learning
linear regression
computationally efficient
upper bound
state space
machine learning
learning process
multi class
optimal policy
temporal difference
markov decision processes