What happens if you treat ordinal ratings as interval data? Human evaluations in NLP are even more under-powered than you think.

Published in: EMNLP (1) (2021)

Keyphrases