Login / Signup

Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards.

Ronald C. van den BroekRik LitjensTobias SagisLuc SieckerNina VerbeekePratik Gajane
Published in: CoRR (2022)
Keyphrases
  • multi armed bandits
  • bandit problems
  • multi armed bandit
  • decision problems
  • active learning
  • data distribution
  • temporal information
  • reinforcement learning
  • search space
  • markov decision processes
  • learning theory