Login / Signup
Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks.
Andrea Sottana
Bin Liang
Kai Zou
Zheng Yuan
Published in:
EMNLP (2023)
Keyphrases
</>
language model
evaluation metrics
language modeling
n gram
query expansion
document retrieval
decision trees
speech recognition
average precision
keywords
information extraction
retrieval model
language modelling
statistical language models