Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback.

Mohammad Pedramfar Vaneet Aggarwal

Published in: CoRR (2023)

Keyphrases

regret bounds
multi armed bandit
stochastic systems
multi armed bandits
lower bound
greedy algorithm
linear regression
upper bound
online learning
reinforcement learning
objective function
random sampling
monte carlo
multi armed bandit problems
stochastic nature
stochastic optimization
relevance feedback
learning algorithm
energy minimization
high order
neural network
peer to peer
least squares
bayesian networks
website
information retrieval