Login / Signup
Jailbreaking as a Reward Misspecification Problem.
Zhihui Xie
Jiahui Gao
Lei Li
Zhenguo Li
Qi Liu
Lingpeng Kong
Published in:
CoRR (2024)
Keyphrases
</>
reinforcement learning
average reward
partially observable environments
neural network
data mining
image processing
clustering algorithm
evolutionary algorithm
probability distribution
inverse reinforcement learning
bandit problems
eligibility traces