Login / Signup

Optimal Baseline Corrections for Off-Policy Contextual Bandits.

Shashank GuptaOlivier JeunenHarrie OosterhuisMaarten de Rijke
Published in: CoRR (2024)
Keyphrases
  • contextual information
  • dynamic programming
  • closed form
  • computer vision
  • optimal control
  • online learning
  • multi armed bandit
  • data sets
  • artificial intelligence
  • data structure
  • worst case
  • optimal design