RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs.
Xuan ChenYuzhou NieLu YanYunshu MaoWenbo GuoXiangyu ZhangPublished in: CoRR (2024)
Keyphrases
- error rate
- black box
- reinforcement learning
- black boxes
- function approximation
- model free
- reinforcement learning algorithms
- state space
- white box
- rl algorithms
- machine learning
- markov decision processes
- temporal difference
- multi agent
- hybrid systems
- continuous state
- optimal policy
- actor critic
- integration testing
- learning algorithm
- autonomous learning
- policy evaluation
- direct policy search
- reinforcement learning methods
- temporal difference learning
- learning agents
- action selection
- action space
- markov decision process
- partially observable
- learning classifier systems
- learning problems
- transfer learning
- test cases
- learning process
- decision trees
- learning agent
- data sets
- state transition
- state and action spaces
- neural network