Login / Signup
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs.
Chung-Wei Lee
Haipeng Luo
Chen-Yu Wei
Mengxiao Zhang
Published in:
CoRR (2020)
Keyphrases
</>
data dependent
multi armed bandit
regret bounds
reinforcement learning
bregman divergences
markov decision processes
lower bound
online learning
linear regression
state space
hash functions
e learning
denoising
optimal policy
energy functional