Hindsight PRIORs for Reward Learning from Human Preferences.
Mudit VermaKatherine MetcalfPublished in: ICLR (2024)
Keyphrases
- learning algorithm
- knowledge acquisition
- prior knowledge
- learning process
- active learning
- learning systems
- human experts
- inductive inference
- language acquisition
- training data
- reinforcement learning
- decision making
- probabilistic model
- data sets
- learning problems
- solving problems
- preference learning
- learning preferences