How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions.
Lorenzo PacchiardiAlex James ChanSören MindermannIlan MoscovitzAlexa Y. PanYarin GalOwain EvansJan Markus BraunerPublished in: ICLR (2024)
Keyphrases
- black box
- black boxes
- artificial intelligence
- white box
- test cases
- detection rate
- detection algorithm
- detection method
- hybrid systems
- rule extraction
- automatic detection
- false alarms
- expert systems
- databases
- state transition
- false positives
- object detection
- integration testing
- white box testing
- object oriented
- computational intelligence
- ai researchers
- knowledge representation
- face detection