Hindsight PRIORs for Reward Learning from Human Preferences.
Mudit VermaKatherine MetcalfPublished in: CoRR (2024)
Keyphrases
- learning process
- learning algorithm
- reinforcement learning
- online learning
- learning systems
- prior knowledge
- supervised learning
- data sets
- human learning
- human behavior
- preference learning
- language acquisition
- inductive inference
- learning analytics
- bayesian framework
- learning problems
- learning experience
- active learning
- e learning
- neural network