Login / Signup
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks.
Charlotte Siska
Katerina Marazopoulou
Melissa Ailem
James Bono
Published in:
ACL (1) (2024)
Keyphrases
</>
distributional assumptions
real time
evaluation methods
evaluation process
learning algorithm
evaluation method
evaluation criteria
database
machine learning
wide range
evaluation model
comparative evaluation