Login / Signup
Adversarial Benchmark Evaluation Rectified by Controlling for Difficulty.
Behzad Mehrbakhsh
Fernando Martínez-Plumed
José Hernández-Orallo
Published in:
ECAI (2023)
Keyphrases
</>
gold standard
evaluation framework
multi agent
wide range
evaluation metrics
quantitative evaluation