Login / Signup

BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance.

R. Thomas McCoyJunghyun MinTal Linzen
Published in: BlackboxNLP@EMNLP (2020)
Keyphrases
  • test set
  • error rate
  • training data
  • training set
  • test cases
  • test data
  • evaluation methodology
  • machine learning
  • database
  • probabilistic model