Login / Signup
Bandits with costly reward observations.
Aaron David Tucker
Caleb Biddulph
Claire Wang
Thorsten Joachims
Published in:
UAI (2023)
Keyphrases
</>
multi armed bandit
reinforcement learning
databases
error prone
real world
decision trees
high cost
multi armed bandits
decision making
noisy observations