Mitigating Reward Hacking via Information-Theoretic Reward Modeling.
Yuchun MiaoSen ZhangLiang DingRong BaoLefei ZhangDacheng TaoPublished in: CoRR (2024)
Keyphrases
- information theoretic
- mutual information
- information theory
- theoretic framework
- reinforcement learning
- information theoretic measures
- information bottleneck
- jensen shannon divergence
- kullback leibler divergence
- log likelihood
- kl divergence
- multi modality
- relative entropy
- machine learning
- computational learning theory
- minimum description length
- pattern recognition
- similarity measure
- computer vision