Publication: Text encoders are performance bottlenecks in contrastive vision-language models.