Login / Signup

How structured are the representations in transformer-based vision encoders? An analysis of multi-object representations in vision-language models.

Tarun KhajuriaBraian Olmiro DiasJaan Aru
Published in: CoRR (2024)
Keyphrases
  • search engine
  • language model
  • computer vision
  • object representations
  • n gram
  • probabilistic model
  • image processing
  • mixture model
  • language modeling