Contextual bandits with concave rewards, and an application to fair ranking.
Virginie DoElvis DohmatobMatteo PirottaAlessandro LazaricNicolas UsunierPublished in: ICLR (2023)
Keyphrases
- multi armed bandits
- ranking algorithm
- contextual information
- ranking functions
- reinforcement learning
- context sensitive
- bandit problems
- objective function
- web search
- stochastic systems
- learning to rank
- high level
- rank aggregation
- markov decision processes
- ranked list
- machine learning
- convexity properties
- multiarmed bandit
- long term and short term
- contextual knowledge
- link analysis
- user feedback
- evaluation measures
- keyword search
- language model