Login / Signup
How structured are the representations in transformer-based vision encoders? An analysis of multi-object representations in vision-language models.
Tarun Khajuria
Braian Olmiro Dias
Jaan Aru
Published in:
CoRR (2024)
Keyphrases
</>
search engine
language model
computer vision
object representations
n gram
probabilistic model
image processing
mixture model
language modeling