Optimal Algorithms for Stochastic Contextual Preference Bandits.
Aadirupa SahaPublished in: NeurIPS (2021)
Keyphrases
- multi armed bandit
- learning algorithm
- worst case
- decision trees
- regret bounds
- machine learning algorithms
- data mining techniques
- dynamic programming
- optimization problems
- decision makers
- significant improvement
- theoretical analysis
- computational complexity
- optimal control
- exhaustive search
- stochastic search
- search algorithm
- optimal solution