On preferences and reward policies over rankings.
Marco FaellaLuigi SauroPublished in: Auton. Agents Multi Agent Syst. (2024)
Keyphrases
- infinite horizon
- optimal policy
- total reward
- long run
- average reward
- reinforcement learning
- reward function
- markov decision processes
- expected reward
- markov decision process
- pairwise comparisons
- state space
- dynamic programming
- control policy
- user preferences
- control policies
- discounted reward
- finite state
- decision making
- individual preferences
- soft constraints
- rank aggregation
- multiple criteria
- inverse reinforcement learning
- reinforcement learning algorithms
- multi attribute
- decision makers
- temporally extended
- rank correlation
- optimal solution
- information retrieval