Login / Signup
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs.
Chung-Wei Lee
Haipeng Luo
Chen-Yu Wei
Mengxiao Zhang
Published in:
NeurIPS (2020)
Keyphrases
</>
data dependent
multi armed bandit
regret bounds
reinforcement learning
bregman divergences
markov decision processes
lower bound
online learning
linear regression
state space
pairwise
special case
probability distribution
upper bound
optimal policy
hash functions