Login / Signup

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate.

Steffi ChernEthan ChernGraham NeubigPengfei Liu
Published in: CoRR (2024)
Keyphrases
  • language model
  • evaluation criteria
  • probabilistic model
  • document retrieval
  • language modeling
  • search engine
  • decision trees
  • hidden markov models
  • retrieval model
  • evaluation measures