Login / Signup
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks.
Melissa Ailem
Katerina Marazopoulou
Charlotte Siska
James Bono
Published in:
CoRR (2024)
Keyphrases
</>
distributional assumptions
gold standard
knowledge base
relational databases
mobile robot
data mining
genetic algorithm
feature extraction
similarity measure
bayesian networks
lower bound
wireless sensor networks
empirical evaluation
comparative evaluation
high robustness
benchmark suite