Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents.
Eric Michael SmithOrion HsuRebecca QianStephen RollerY-Lan BoureauJason WestonPublished in: CoRR (2022)
Keyphrases
- gold standard
- multi agent
- significant improvement
- decision making
- neural network
- cooperative
- natural language
- evaluation method
- conversational agents
- multiple agents
- evaluation methods
- content analysis
- machine learning methods
- dynamic environments
- empirical studies
- computational cost
- preprocessing
- reinforcement learning