Login / Signup
HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation.
Wen Luo
Tianshu Shen
Wei Li
Guangyue Peng
Richeng Xuan
Houfeng Wang
Xi Yang
Published in:
CoRR (2024)
Keyphrases
</>
automatic evaluation
real world
small scale
fully automatic
evaluation method
levels of abstraction
semi automatic
facial images
multi agent
natural language
low level
data sets
evaluation criteria
gold standard
dialogue system
mixed initiative
machine learning
conversational agent