Login / Signup
BUMP: A Benchmark of Unfaithful Minimal Pairs for Meta-Evaluation of Faithfulness Metrics.
Liang Ma
Shuyang Cao
Robert L. Logan IV
Di Lu
Shihao Ran
Ke Zhang
Joel R. Tetreault
Aoife Cahill
Alejandro Jaimes
Published in:
CoRR (2022)
Keyphrases
</>
evaluation metrics
evaluation methods
evaluation criteria
evaluation measures
evaluation methodology
pairwise
artificial neural networks
evaluation method
case study
probabilistic model
genetic algorithm
image sequences
comparative evaluation