Login / Signup

Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents.

Eric Michael SmithOrion HsuRebecca QianStephen RollerY-Lan BoureauJason Weston
Published in: ConvAI@ACL (2022)
Keyphrases
  • multi agent systems
  • gold standard
  • neural network
  • preprocessing
  • cooperative
  • empirical studies
  • benchmark datasets
  • evaluation methods
  • significant improvement
  • intelligent agents
  • human experts
  • evaluation metrics