Login / Signup

Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods.

Polina TsvilodubHening WangSharon GroschMichael Franke
Published in: CoRR (2024)
Keyphrases