Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts.
Aditya SharmaMichael SaxonWilliam Yang WangPublished in: CoRR (2024)
Keyphrases
- language model
- visual perception
- image data
- image features
- language modeling
- low level
- image classification
- probabilistic model
- document retrieval
- image segmentation
- n gram
- image representation
- image content
- visual data
- image collections
- speech recognition
- retrieval model
- statistical language models
- computer vision
- visual features
- query expansion
- information retrieval
- image regions
- image retrieval
- language modelling
- language model for information retrieval
- visual information
- text categorization
- smoothing methods
- language models for information retrieval