Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs.

Chung-Wei Lee Haipeng Luo Chen-Yu Wei Mengxiao Zhang

Published in: CoRR (2020)

Keyphrases

data dependent
multi armed bandit
regret bounds
reinforcement learning
bregman divergences
markov decision processes
lower bound
online learning
linear regression
state space
hash functions
e learning
denoising
optimal policy
energy functional