Entangled Preferences: The History and Risks of Reinforcement Learning and Human Feedback.
Nathan LambertThomas Krendl GilbertTom ZickPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- decision making
- human operators
- model free
- risk management
- learning algorithm
- relevance feedback
- human interaction
- user preferences
- function approximation
- human behavior
- policy search
- data sets
- risk analysis
- human experts
- decision makers
- supervised learning
- learning process
- machine learning
- neural network