Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback.
Mohammad PedramfarVaneet AggarwalPublished in: CoRR (2023)
Keyphrases
- regret bounds
- multi armed bandit
- stochastic systems
- multi armed bandits
- lower bound
- greedy algorithm
- linear regression
- upper bound
- online learning
- reinforcement learning
- objective function
- random sampling
- monte carlo
- multi armed bandit problems
- stochastic nature
- stochastic optimization
- relevance feedback
- learning algorithm
- energy minimization
- high order
- neural network
- peer to peer
- least squares
- bayesian networks
- website
- information retrieval