BanditQ - No-Regret Learning with Guaranteed Per-User Rewards in Adversarial Environments.
Abhishek SinhaPublished in: CoRR (2023)
Keyphrases
- online learning
- reinforcement learning
- learning process
- learning algorithm
- distributed learning
- learning systems
- bandit problems
- user interface
- prior knowledge
- learning problems
- knowledge acquisition
- autonomous robots
- binary classification
- dynamic environments
- user experience
- collaborative filtering
- supervised learning
- mobile devices
- multi armed bandits