BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance.

Published in: BlackboxNLP@EMNLP (2020)

Keyphrases