On the Blind Spots of Model-Based Evaluation Metrics for Text Generation.
Tianxing HeJingyu ZhangTianle WangSachin KumarKyunghyun ChoJames R. GlassYulia TsvetkovPublished in: CoRR (2022)
Keyphrases
- evaluation metrics
- text generation
- natural language generation
- precision and recall
- average precision
- natural language
- evaluation methods
- evaluation framework
- theorem prover
- evaluation measures
- learning to rank
- evaluation methodology
- information extraction
- data sets
- natural language processing
- decision trees
- search engine
- artificial intelligence