Inferring Lexicographically-Ordered Rewards from Preferences.
Alihan HüyükWilliam R. ZameMihaela van der SchaarPublished in: AAAI (2022)
Keyphrases
- reinforcement learning
- markov decision processes
- user preferences
- decision making
- multiarmed bandit
- multi attribute
- user defined
- partially ordered
- soft constraints
- preference elicitation
- bandit problems
- genetic algorithm
- search algorithm
- machine learning
- special case
- data mining
- data sets
- individual preferences
- database