Mitigating Reward Hacking via Information-Theoretic Reward Modeling.

Yuchun Miao Sen Zhang Liang Ding Rong Bao Lefei Zhang Dacheng Tao

Published in: CoRR (2024)

Keyphrases

information theoretic
mutual information
information theory
theoretic framework
reinforcement learning
information theoretic measures
information bottleneck
jensen shannon divergence
kullback leibler divergence
log likelihood
kl divergence
multi modality
relative entropy
machine learning
computational learning theory
minimum description length
pattern recognition
similarity measure
computer vision