Login / Signup
One-bit feedback is sufficient for upper confidence bound policies.
Daniel Vial
Sanjay Shakkottai
R. Srikant
Published in:
CoRR (2020)
Keyphrases
</>
upper confidence bound
contextual bandit
optimal policy
relevance feedback
user feedback
machine learning