Visual Data-Type Understanding does not emerge from scaling Vision-Language Models.
Vishaal UdandaraoMax F. BurgSamuel AlbanieMatthias BethgePublished in: ICLR (2024)
Keyphrases
- language model
- visual data
- language modeling
- visual information
- information retrieval
- probabilistic model
- n gram
- high dimensional
- visual features
- test collection
- audio visual
- computer vision
- video data
- smoothing methods
- image data
- contextual information
- multimedia data
- video sequences
- data sets
- visual content
- image sequences
- document collections
- text categorization
- human actions
- feature space