Login / Signup
Optimal Baseline Corrections for Off-Policy Contextual Bandits.
Shashank Gupta
Olivier Jeunen
Harrie Oosterhuis
Maarten de Rijke
Published in:
CoRR (2024)
Keyphrases
</>
contextual information
dynamic programming
closed form
computer vision
optimal control
online learning
multi armed bandit
data sets
artificial intelligence
data structure
worst case
optimal design