Login / Signup

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment.

Jiongxiao WangJiazhao LiYiquan LiXiangyu QiJunjie HuYixuan LiPatrick McDanielMuhao ChenBo LiChaowei Xiao
Published in: CoRR (2024)
Keyphrases
  • fine tuning
  • viable alternative
  • fine tune
  • image alignment
  • fine tuned
  • dynamic time warping
  • countermeasures
  • detection mechanism
  • neural network
  • genetic algorithm
  • decision support system
  • attack detection