Login / Signup
Anytime Hedge achieves optimal regret in the stochastic regime.
Jaouad Mourtada
Stéphane Gaïffas
Published in:
CoRR (2018)
Keyphrases
</>
worst case
regret bounds
dynamic programming
multi armed bandit
locally optimal
stochastic dynamic programming
lower bound
learning algorithm
reinforcement learning
loss function
game theory
stochastic programming