Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency.
Heyang ZhaoJiafan HeDongruo ZhouTong ZhangQuanquan GuPublished in: COLT (2023)
Keyphrases
- computational efficiency
- regret bounds
- reinforcement learning
- multi armed bandit
- online learning
- lower bound
- linear regression
- computationally efficient
- upper bound
- machine learning
- state space
- model free
- kl divergence
- bregman divergences
- least squares
- optimal policy
- information theoretic
- covariance matrix
- computational complexity