Consistent Aggregation of Objectives with Diverse Time Preferences Requires Non-Markovian Rewards.
Silviu PitisPublished in: NeurIPS (2023)
Keyphrases
- reinforcement learning
- multiple objectives
- markov decision processes
- user preferences
- reward function
- decision making
- data sets
- stochastic process
- temporally extended
- collaborative filtering
- markov chain
- wide variety
- wireless sensor networks
- multi attribute
- situation calculus
- multi objective
- data aggregation
- aggregation operators