Login / Signup
Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts.
Giulia Romano
Andrea Agostini
Francesco Trovò
Nicola Gatti
Marcello Restelli
Published in:
CoRR (2022)
Keyphrases
</>
spatio temporal
temporal information
markov decision processes
reinforcement learning
relevance feedback
positive feedback
bandit problems
real time
real world
machine learning
decision trees
multiscale
utility function
user feedback
reward function
equal sized