Preference Poisoning Attacks on Reward Model Learning.
Junlin WuJiongxiao WangChaowei XiaoChenguang WangNing ZhangYevgeniy VorobeychikPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- management system
- learning scheme
- learning process
- statistical model
- prior knowledge
- probabilistic model
- computational model
- learned models
- high level
- experimental data
- conceptual model
- supervised learning
- partially observable environments
- learning phase
- learning mechanism
- em algorithm
- semi supervised
- active learning
- objective function
- similarity measure