Consistent Aggregation of Objectives with Diverse Time Preferences Requires Non-Markovian Rewards.
Silviu PitisPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- multiple objectives
- wide variety
- decision making
- real world
- credit assignment
- markov decision processes
- decision processes
- stochastic process
- multiarmed bandit
- neural network
- preference aggregation
- preference elicitation
- data aggregation
- decision process
- multi attribute
- multi agent
- learning algorithm