Login / Signup
Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards.
Ronald C. van den Broek
Rik Litjens
Tobias Sagis
Luc Siecker
Nina Verbeeke
Pratik Gajane
Published in:
CoRR (2022)
Keyphrases
</>
multi armed bandits
bandit problems
multi armed bandit
decision problems
active learning
data distribution
temporal information
reinforcement learning
search space
markov decision processes
learning theory