Login / Signup
BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance.
R. Thomas McCoy
Junghyun Min
Tal Linzen
Published in:
BlackboxNLP@EMNLP (2020)
Keyphrases
</>
test set
error rate
training data
training set
test cases
test data
evaluation methodology
machine learning
database
probabilistic model