Sign in

Adversarial Benchmark Evaluation Rectified by Controlling for Difficulty.

Behzad MehrbakhshFernando Martínez-PlumedJosé Hernández-Orallo
Published in: ECAI (2023)
Keyphrases
  • gold standard
  • evaluation framework
  • multi agent
  • wide range
  • evaluation metrics
  • quantitative evaluation